Questions tagged with AWS Glue

Content language: English

Select up to 5 tags to filter

Sort by most recent

Filter Questions by

AllAnsweredUnansweredNo Answer

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Glue crawler parsing emr-serverless spark stderr logs in S3

Trying to figure out if it's possible to use AWS Glue crawler to parse the spark stderr logs that are dumped from emr-serverless. The logs are space delimited. I tried running a crawler against the...

AWS Glue Amazon EMR Serverless

answers

votes

156

views

ebethj

asked 3 months ago

AWS Glue - Surprising costs for data preview session

Hello, While building a job in AWS Glue (Amazon S3, Change Schema, AWS Glue Data Catalog), I had a surprising cost for data preview session (AWS Glue GlueInteractiveSession) of 91% of the total...

Accepted AnswerAWS Glue Extract Transform & Load Data

answers

votes

178

views

rePost-User-4717319

asked 3 months ago

“Parquet column cannot be converted in file, Pyspark Expected string Found: INT32.”

I encountered the following error, “Parquet column cannot be converted in file, Pyspark Expected string Found: INT32.” I tried to convert the column to INT32 (Applying withColumn(), but the error...

AWS Glue

answers

votes

661

views

rePost-User-9456311

asked 3 months ago

Glue Crawler Include Path

Hi All, I set up a crawler, which is giving me headaches when it comes to the "Include path". My path looks currently something like this: databaseName/schema/%_qt_% This works fine, meaning that the...

AWS Glue

answers

votes

148

views

leticat

asked 3 months ago

Custom parameters to visual ETL job

I want to use Glue Studio for creating a glue ETL job. This job needs to filter out the data in its first step based on the input parameters given to it at run time. Is there a way with visual ETL...

Accepted AnswerAWS Glue

answers

votes

329

views

Anshu

asked 3 months ago

Glue python repartition while retaining old partition column

I have data currently partitioned on a key (say cluster) and I'm repartitioning to a new key 'date'. So I do (in Python) ``` df = glueContext.create_dynamic_frame.from_options(...) df =...

AWS Glue

answers

votes

162

views

rePost-User-3866371

asked 3 months ago

AWS Data Catalog table index not working

Hello, For an AWS Data Catalog table, I ran Glue (structure: Amazon S3 -> Change Schema -> AWS Glue Data Catalog ) and populate table with only string records. All the actions were done from the...

Accepted AnswerAWS Glue Extract Transform & Load Data

answers

votes

146

views

rePost-User-4717319

asked 3 months ago

AWS Glue Studio Visual Editor Data Preview changing schema data types incorrectly

We have a file that we used the default XML crawler to crawl the data for, and it correctly created a table and schema for the data (relevant column shown): ![Correct...

AWS Glue

answers

votes

128

views

jeff

asked 3 months ago

Handle de-dup in Glue Job Pyspark

Hello I am using PySpark on Glue Job to do ETL on a table sourced from S3 And S3 sourced from mysql via DMS (table schema as below, column 'op', 'row_updated_timestamp' & 'row_commit_timestamp' are...

AWS Glue Extract Transform & Load Data

answers

votes

114

views

rePost-User-1943247

asked 3 months ago

How to fix the unknown schema datatype of AWS Glue Table?

There was a data source (JSON files) in S3. The JSON structure is as follows. I used AWS Glue Crawler to build the Glue table based on this S3 data source. I think the "data" column should be "Struct"...

Accepted AnswerAWS Glue

answers

votes

314

views

CharlieWu

asked 3 months ago

Insufficient Lake Formation permission(s) on mock_data_patient (Database name: crawl_db, Table Name: mock_data_patient)

Crawler Error: Insufficient Lake Formation permission(s) on mock_data_patient (Database name: crawl_db, Table Name: mock_data_patient) (Service: AWSGlue; Status Code: 400; Error Code:...

Accepted AnswerAWS Identity and Access Management AWS Glue AWS Lake Formation

answers

votes

168

views

Omkar

asked 3 months ago

Glue Visual ETL: Can't copy raw data from RDS MySQL to S3 bucket due to unclassified error: Schema specified that header line is to be written; but contains no column names

I'm trying to build an ETL pipeline with AWS Glue, and the first step is to copy raw data from the original source to a staging bucket. The job is rather simple: source is a data catalog table (from...

Accepted AnswerAWS Glue Extract Transform & Load Data

answers

votes

227

views

NLopeDeBarrios

asked 3 months ago