Questions tagged with AWS Glue

Content language: English

Select up to 5 tags to filter

Sort by most recent

Filter Questions by

AllAnsweredUnansweredNo Answer

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Glue Crawler Include Path

Hi All, I set up a crawler, which is giving me headaches when it comes to the "Include path". My path looks currently something like this: databaseName/schema/%_qt_% This works fine, meaning that the...

AWS Glue

answers

votes

170

views

leticat

asked 4 months ago

Custom parameters to visual ETL job

I want to use Glue Studio for creating a glue ETL job. This job needs to filter out the data in its first step based on the input parameters given to it at run time. Is there a way with visual ETL...

Accepted AnswerAWS Glue

answers

votes

460

views

Anshu

asked 4 months ago

Glue python repartition while retaining old partition column

I have data currently partitioned on a key (say cluster) and I'm repartitioning to a new key 'date'. So I do (in Python) ``` df = glueContext.create_dynamic_frame.from_options(...) df =...

AWS Glue

answers

votes

187

views

rePost-User-3866371

asked 4 months ago

AWS Data Catalog table index not working

Hello, For an AWS Data Catalog table, I ran Glue (structure: Amazon S3 -> Change Schema -> AWS Glue Data Catalog ) and populate table with only string records. All the actions were done from the...

Accepted AnswerAWS Glue Extract Transform & Load Data

answers

votes

178

views

rePost-User-4717319

asked 4 months ago

AWS Glue Studio Visual Editor Data Preview changing schema data types incorrectly

We have a file that we used the default XML crawler to crawl the data for, and it correctly created a table and schema for the data (relevant column shown): ![Correct...

AWS Glue

answers

votes

142

views

jeff

asked 5 months ago

Handle de-dup in Glue Job Pyspark

Hello I am using PySpark on Glue Job to do ETL on a table sourced from S3 And S3 sourced from mysql via DMS (table schema as below, column 'op', 'row_updated_timestamp' & 'row_commit_timestamp' are...

AWS Glue Extract Transform & Load Data

answers

votes

137

views

rePost-User-1943247

asked 5 months ago

How to fix the unknown schema datatype of AWS Glue Table?

There was a data source (JSON files) in S3. The JSON structure is as follows. I used AWS Glue Crawler to build the Glue table based on this S3 data source. I think the "data" column should be "Struct"...

Accepted AnswerAWS Glue

answers

votes

414

views

CharlieWu

asked 5 months ago

Insufficient Lake Formation permission(s) on mock_data_patient (Database name: crawl_db, Table Name: mock_data_patient)

Crawler Error: Insufficient Lake Formation permission(s) on mock_data_patient (Database name: crawl_db, Table Name: mock_data_patient) (Service: AWSGlue; Status Code: 400; Error Code:...

Accepted AnswerAWS Identity and Access Management AWS Glue AWS Lake Formation

answers

votes

203

views

Omkar

asked 5 months ago

Glue Visual ETL: Can't copy raw data from RDS MySQL to S3 bucket due to unclassified error: Schema specified that header line is to be written; but contains no column names

I'm trying to build an ETL pipeline with AWS Glue, and the first step is to copy raw data from the original source to a staging bucket. The job is rather simple: source is a data catalog table (from...

Accepted AnswerAWS Glue Extract Transform & Load Data

answers

votes

285

views

NLopeDeBarrios

asked 5 months ago

Glue ETL AccessDeniedException for not existent Lake Formation

Hello, In a Glue ETL made of nodes: Amazon S3, Change Schema, AWS Glue Data Catalog with the table "us_spending" backed by S3, I have the following error: > Error Category: PERMISSION_ERROR;...

Accepted AnswerAWS Glue Extract Transform & Load Data AWS Lake Formation

answers

votes

230

views

rePost-User-4717319

asked 5 months ago

Pass parameter from one glue job to another in step function?

I am looking for the best way to pass a parameter from one glue job to another within a step function. Each day, I will receive a file. In the file there will be data for certain dates. The first...

AWS Step Functions AWS Glue Extract Transform & Load Data

answers

votes

895

views

rpost

asked 5 months ago

Using AWS Glue to export ~500TB of DynamoDB table to S3 bucket

We have use case where we want to export ~500TB of DynamoDb data to a S3, one of the possible approaches that I found was making use of AWS Glue Job. Also while exporting the data to S3, we need to...

AWS Glue

answers

votes

319

views

shasnk

asked 5 months ago