All Content tagged with AWS Glue

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.

Content language: English

Select tags to filter
Sort by most recent
2010 results
Our current setup runs the AWS Glue job in us-west-1 while our raw Parquet data is stored there. However, our target AWS Glue Data Catalog (for Iceberg table definitions) resides in us-east-2. By defa...
2
answers
0
votes
144
views
asked 2 months ago
I want to add only two columns EXECUTION_MARKET and TA_NUMBERSYS from database table DSUSER00.SECTANR in glue python script but not sure how to do it. Below is my code: ``` import sys from awsgl...
2
answers
0
votes
38
views
asked 2 months ago
I want to provide filter for table _test_rw_omuc_baag__test1_app **where to_date('19700101','yyyymmdd') + (((AENDERUNGSDATLO/60)/60)/24) > to_date('20220101','yyyymmdd')** in glue python script but no...
2
answers
0
votes
63
views
asked 2 months ago
Dears, I am trying to create an ice berg table using Athena query as shown: CREATE TABLE test_kafka.my_record_iceberg ( index string, userid string, firstname string, lastname strin...
2
answers
0
votes
218
views
asked 2 months ago
In Glue 4 I can run a job using kafka as writer format: ``` .write \ .format('kafka') \ .option('kafka.bootstrap.servers', args["kafka_bootstrap_servers"]) \ .option('kafka.security.protoc...
2
answers
0
votes
101
views
profile picture
asked 2 months ago
I am using python module in aws glue ![JOB Parameter for additional python Libraries](/media/postImages/original/IMPZIXVmifQBCr0wEv0WKHVw) But Glue job fails in randomly ![Glue Error](/media/postImag...
1
answers
0
votes
177
views
asked 2 months ago
Hello, I have the following data pipeline: RDS DB → DMS → S3 (Parquet) ↔ Glue Data Catalog ← Redshift (Spectrum). I'm using an AWS Glue Crawler to create tables for the external database (based on Par...
1
answers
0
votes
72
views
asked 2 months ago
I am emitting records from lambda function using firehose.put_record_batch function. However firehose raises delivery error even though python record exactly matches the schema of the table. identifie...
1
answers
0
votes
74
views
asked 2 months ago
Hello, all. Recently we migrated jobs from Glue 4.0 to 5.0 and we use pyspark. We noticed that our jobs fail with an error: ``` Suppressed: software.amazon.awssdk.core.exception.SdkClientException: R...
Accepted AnswerAWS Glue
2
answers
0
votes
375
views
asked 2 months ago
Hello, all. Recently we migrated jobs from Glue 4.0 to 5.0 and we use pyspark. As for the doc Glue 5 dependencies are: Spark 3.5.2, Python 3.11,Scala 2.12.18 We use Spark-Excel 2.12-3.5.1_0.20.4 as th...
1
answers
2
votes
90
views
asked 2 months ago
We received a notification that we are using deprecated Amazon Athena APIs or use a deprecated Athena data catalog resource. (In this case, it appears to be a data catalog) Using the information prov...
1
answers
0
votes
100
views
asked 2 months ago
The JSON data with a nested structure is streamed into FireHose via a direct put over to S3 through a Glue schema that is configured. It has been noticed from querying via Athena or S3 files that only...
1
answers
0
votes
101
views
asked 2 months ago