Questions tagged with Extract Transform & Load Data
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
Hey Guys
I want to run my pyspark on EMR Serverless but it has some dependencies/libraries which are needed by the pyspark script to run. Please suggest a optimized approach to import the...
1
answers
0
votes
356
views
asked 9 months agolg...
Hello,
I have gone through the recommended changes provided in [this](https://repost.aws/knowledge-center/glue-crawler-internal-service-exception) article. However, I continue to get the same...
1
answers
0
votes
225
views
asked 9 months agolg...
Hi there I managed to convert csv files to parquet files using glue job, my crawler does see the parquet files in the s3 bucket and crawls it and present me with the proper schema and adds for each...
1
answers
0
votes
451
views
asked 9 months agolg...
Are there any known, recently (~07/18/2023) introduced performance issues with Glue crawlers?
We have recently observed excessive slowness with Glue crawlers that had been running for months without...
0
answers
0
votes
28
views
asked 9 months agolg...
Glue 4 Hudi supportlg...
I am trying to store a data stream from kafka using the hudi format. I am following this doc https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-hudi.html and I even tried to...
3
answers
0
votes
253
views
asked 9 months agolg...
We are moving our content from one developer to another. I am trying to figure out not to stop my Images and pdf uploads from being rasterized.
My old developer had figured it out, but I can't.
Any...
1
answers
0
votes
192
views
asked 9 months agolg...
I am trying to create a Delta Table from spark sql using the Glue meta catalog.
I can correctly query a Delta table using the Glue metastore:
```
%%sql
select * from `my_table` VERSION AS OF 1 limit...
2
answers
0
votes
1339
views
asked 9 months agolg...
Hi Team,
I have a complex nested xml file which I want to read using AWS Glue and convert it to parquet format. I want to use pandas read_xml function to read the xml file. But, I get error lxml not...
1
answers
0
votes
272
views
asked 9 months agolg...
I read data from s3 using as follow.
```
sec_id_dyf = glueContext.create_dynamic_frame.from_options(
connection_type = 's3',
...
0
answers
0
votes
92
views
asked 9 months agolg...
Hello, I have been experimenting with Aws glue, and created some crawlers to crawl the data but the behavior wasn't what I expected,
Question 1) I had an S3 bucket
with 3...
1
answers
0
votes
281
views
asked 9 months agolg...
Looks like attempting to write to a Delta Lake table from a DynamicFrame is not working. The Visual Glue interface generates a script like:
```
s3 =...
2
answers
0
votes
396
views
asked 9 months agolg...
We are trying to use Glue to query and aggregate some Parquet files in S3.
We get this error related to schema mismatch:
```
An error occurred while calling o106.pyWriteDynamicFrame....
2
answers
0
votes
223
views
asked 9 months agolg...