All Content tagged with AWS Glue
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.
Content language: English
Select up to 5 tags to filter
Sort by most recent
We are trying to read a CSV file to process the data using AWS Glue and we are getting an error message as below:
Py4JJavaError: An error occurred while calling o91.schema.
:...
How can I monitor who is querying which glue tables?
After some trial and error I found that the BatchGetTable Glue API event is recorded using CloudTrail every time I run an Athena query, and...
I have Athena Iceberg table.
The table has 2 partitions.
Each hour I update it with `MERGE` and `DELETE` commands.
```
SELECT count(*) FROM "my_table$files"
```
now **gives 16. Meanwhile data...
I have a bunch of parquet files in a flat S3 folder, no partitions:...
Hi. I had a table that was created by a crawler, then I deleted the table ( in Athena) and created it by DDL. after running crawler. it could not find the table and create a new table.
note: The s3...
I have a few text files on S3 that I need to add to the Glue Catalog in order to use them in a job. None of them have separators, they are all fixed-width files. I have the schemas, but the crawler...
Hello,
I am currently working with AWS Glue ETL Jobs and encountered an issue where the "Push to repository" and "Pull from repository" options are disabled when trying to push the script/job to...
I created a custom visual transform component and put the needed json and python files in S3. The component loaded up as expected. Later, I needed to do some more adjustments to the parameters...
I have a glue pyspark script that processes DDB data exported to S3 and writes it to Redshift. Initially, it was using below logic:
```
redshiftConnectionOptions = {
"postactions": "BEGIN; MERGE...
I just can't understand what I'm doing wrong.
I have a table.
```
CREATE EXTERNAL TABLE test (
originalrequest string,
requeststarted string
)
PARTITIONED BY (
req_start_partition...
I have been writing CloudFormation Stack using `yaml` and deploying it to AWS Infrastructure ( For legacy reasons, I can not switch to CDK unfortunately ;))
Following yaml code is a part of the...
Got this error when trying to insert from temp internal table to external table.
ERROR: Invalid DataCatalog response for external table "reportdb"."logs_aggregated": Cannot deserialize Table. Error:...