Questions tagged with AWS Glue
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
Hi All,
I set up a crawler, which is giving me headaches when it comes to the "Include path". My path looks currently something like this:
databaseName/schema/%_qt_%
This works fine, meaning that the...
1
answers
0
votes
170
views
asked 4 months agolg...
I want to use Glue Studio for creating a glue ETL job. This job needs to filter out the data in its first step based on the input parameters given to it at run time. Is there a way with visual ETL...
Accepted AnswerAWS Glue
2
answers
0
votes
460
views
asked 4 months agolg...
I have data currently partitioned on a key (say cluster) and I'm repartitioning to a new key 'date'. So I do (in Python)
```
df = glueContext.create_dynamic_frame.from_options(...)
df =...
1
answers
0
votes
187
views
asked 4 months agolg...
Hello,
For an AWS Data Catalog table, I ran Glue (structure: Amazon S3 -> Change Schema -> AWS Glue Data Catalog ) and populate table with only string records. All the actions were done from the...
1
answers
0
votes
178
views
asked 4 months agolg...
We have a file that we used the default XML crawler to crawl the data for, and it correctly created a table and schema for the data (relevant column shown):
![Correct...
0
answers
0
votes
142
views
asked 5 months agolg...
Hello
I am using PySpark on Glue Job to do ETL on a table sourced from S3 And S3 sourced from mysql via DMS (table schema as below, column 'op', 'row_updated_timestamp' & 'row_commit_timestamp' are...
1
answers
0
votes
137
views
asked 5 months agolg...
There was a data source (JSON files) in S3. The JSON structure is as follows.
I used AWS Glue Crawler to build the Glue table based on this S3 data source.
I think the "data" column should be "Struct"...
Accepted AnswerAWS Glue
2
answers
0
votes
414
views
asked 5 months agolg...
Crawler Error:
Insufficient Lake Formation permission(s) on mock_data_patient (Database name: crawl_db, Table Name: mock_data_patient) (Service: AWSGlue; Status Code: 400; Error Code:...
1
answers
0
votes
203
views
asked 5 months agolg...
I'm trying to build an ETL pipeline with AWS Glue, and the first step is to copy raw data from the original source to a staging bucket. The job is rather simple: source is a data catalog table (from...
1
answers
0
votes
285
views
asked 5 months agolg...
Hello,
In a Glue ETL made of nodes: Amazon S3, Change Schema, AWS Glue Data Catalog with the table "us_spending" backed by S3, I have the following error:
> Error Category: PERMISSION_ERROR;...
1
answers
0
votes
230
views
asked 5 months agolg...
I am looking for the best way to pass a parameter from one glue job to another within a step function.
Each day, I will receive a file. In the file there will be data for certain dates. The first...
1
answers
0
votes
895
views
asked 5 months agolg...
We have use case where we want to export ~500TB of DynamoDb data to a S3, one of the possible approaches that I found was making use of AWS Glue Job.
Also while exporting the data to S3, we need to...
2
answers
0
votes
319
views
asked 5 months agolg...