Questions tagged with AWS Glue
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
Trying to figure out if it's possible to use AWS Glue crawler to parse the spark stderr logs that are dumped from emr-serverless.
The logs are space delimited. I tried running a crawler against the...
0
answers
0
votes
156
views
asked 3 months agolg...
Hello,
While building a job in AWS Glue (Amazon S3, Change Schema, AWS Glue Data Catalog), I had a surprising cost for data preview session (AWS Glue GlueInteractiveSession) of 91% of the total...
1
answers
0
votes
178
views
asked 3 months agolg...
I encountered the following error, “Parquet column cannot be converted in file, Pyspark Expected string Found: INT32.”
I tried to convert the column to INT32 (Applying withColumn(), but the error...
1
answers
0
votes
661
views
asked 3 months agolg...
Hi All,
I set up a crawler, which is giving me headaches when it comes to the "Include path". My path looks currently something like this:
databaseName/schema/%_qt_%
This works fine, meaning that the...
1
answers
0
votes
148
views
asked 3 months agolg...
I want to use Glue Studio for creating a glue ETL job. This job needs to filter out the data in its first step based on the input parameters given to it at run time. Is there a way with visual ETL...
Accepted AnswerAWS Glue
2
answers
0
votes
329
views
asked 3 months agolg...
I have data currently partitioned on a key (say cluster) and I'm repartitioning to a new key 'date'. So I do (in Python)
```
df = glueContext.create_dynamic_frame.from_options(...)
df =...
1
answers
0
votes
162
views
asked 3 months agolg...
Hello,
For an AWS Data Catalog table, I ran Glue (structure: Amazon S3 -> Change Schema -> AWS Glue Data Catalog ) and populate table with only string records. All the actions were done from the...
1
answers
0
votes
146
views
asked 3 months agolg...
We have a file that we used the default XML crawler to crawl the data for, and it correctly created a table and schema for the data (relevant column shown):
![Correct...
0
answers
0
votes
128
views
asked 3 months agolg...
Hello
I am using PySpark on Glue Job to do ETL on a table sourced from S3 And S3 sourced from mysql via DMS (table schema as below, column 'op', 'row_updated_timestamp' & 'row_commit_timestamp' are...
1
answers
0
votes
114
views
asked 3 months agolg...
There was a data source (JSON files) in S3. The JSON structure is as follows.
I used AWS Glue Crawler to build the Glue table based on this S3 data source.
I think the "data" column should be "Struct"...
Accepted AnswerAWS Glue
2
answers
0
votes
314
views
asked 3 months agolg...
Crawler Error:
Insufficient Lake Formation permission(s) on mock_data_patient (Database name: crawl_db, Table Name: mock_data_patient) (Service: AWSGlue; Status Code: 400; Error Code:...
1
answers
0
votes
168
views
asked 3 months agolg...
I'm trying to build an ETL pipeline with AWS Glue, and the first step is to copy raw data from the original source to a staging bucket. The job is rather simple: source is a data catalog table (from...
1
answers
0
votes
227
views
asked 3 months agolg...