Questions tagged with AWS Glue
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
Trying to figure out if it's possible to use AWS Glue crawler to parse the spark stderr logs that are dumped from emr-serverless.
The logs are space delimited. I tried running a crawler against the...
0
answers
0
votes
164
views
asked 4 months agolg...
Hello,
While building a job in AWS Glue (Amazon S3, Change Schema, AWS Glue Data Catalog), I had a surprising cost for data preview session (AWS Glue GlueInteractiveSession) of 91% of the total...
1
answers
0
votes
213
views
asked 4 months agolg...
I encountered the following error, “Parquet column cannot be converted in file, Pyspark Expected string Found: INT32.”
I tried to convert the column to INT32 (Applying withColumn(), but the error...
1
answers
0
votes
876
views
asked 4 months agolg...
Hi All,
I set up a crawler, which is giving me headaches when it comes to the "Include path". My path looks currently something like this:
databaseName/schema/%_qt_%
This works fine, meaning that the...
1
answers
0
votes
167
views
asked 4 months agolg...
I want to use Glue Studio for creating a glue ETL job. This job needs to filter out the data in its first step based on the input parameters given to it at run time. Is there a way with visual ETL...
Accepted AnswerAWS Glue
2
answers
0
votes
437
views
asked 4 months agolg...
I have data currently partitioned on a key (say cluster) and I'm repartitioning to a new key 'date'. So I do (in Python)
```
df = glueContext.create_dynamic_frame.from_options(...)
df =...
1
answers
0
votes
180
views
asked 4 months agolg...
Hello,
For an AWS Data Catalog table, I ran Glue (structure: Amazon S3 -> Change Schema -> AWS Glue Data Catalog ) and populate table with only string records. All the actions were done from the...
1
answers
0
votes
172
views
asked 4 months agolg...
We have a file that we used the default XML crawler to crawl the data for, and it correctly created a table and schema for the data (relevant column shown):
![Correct...
0
answers
0
votes
141
views
asked 4 months agolg...
Hello
I am using PySpark on Glue Job to do ETL on a table sourced from S3 And S3 sourced from mysql via DMS (table schema as below, column 'op', 'row_updated_timestamp' & 'row_commit_timestamp' are...
1
answers
0
votes
133
views
asked 4 months agolg...
There was a data source (JSON files) in S3. The JSON structure is as follows.
I used AWS Glue Crawler to build the Glue table based on this S3 data source.
I think the "data" column should be "Struct"...
Accepted AnswerAWS Glue
2
answers
0
votes
397
views
asked 4 months agolg...
Crawler Error:
Insufficient Lake Formation permission(s) on mock_data_patient (Database name: crawl_db, Table Name: mock_data_patient) (Service: AWSGlue; Status Code: 400; Error Code:...
1
answers
0
votes
199
views
asked 4 months agolg...
I'm trying to build an ETL pipeline with AWS Glue, and the first step is to copy raw data from the original source to a staging bucket. The job is rather simple: source is a data catalog table (from...
1
answers
0
votes
271
views
asked 4 months agolg...