Questions tagged with Extract Transform & Load Data
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
Hello, I am relatively new to Glue and encountering some challenges with Glue ETL.
Our setup involves a datalake that retrieves data from a backend database as its source. This datalake is...
1
answers
0
votes
283
views
asked a month agolg...
Hello,
I have parquets files in S3 that i parse using Glue Crawler and query in Athena. I found that some files have two columns "x" and "y" that have a type **int64** while other files have them as...
1
answers
0
votes
235
views
asked a month agolg...
In our ETL process we are building out a pipeline where someones job is to take input files (ex. csv) and map the columns to existing column names. After the mapping is complete a glue workflow will...
0
answers
0
votes
181
views
asked a month agolg...
I am reading multiple files from S3 and writing the output to Redshift DB. Below is my code to read all the files from a S3 location (s3://abc/oms/YFS_CATEGORY_ITEM/)
```
yfs_category_item_df =...
2
answers
0
votes
528
views
asked a month agolg...
We have a glue job that is writing large number of items to dynamo.
**If a write to dynamo fails, how can we have access to these individual failed records in order to attempt to resolve and...
1
answers
0
votes
302
views
asked 2 months agolg...
Hi I have created an external table on AWS Glue catalog db .
The table points to a lz4 compressed file on an s3.
the table definition looks like this
```
CREATE EXTERNAL TABLE `myapplogs`(
...
1
answers
0
votes
300
views
asked 2 months agolg...
Why doesn't Glue Job and Glue Workflow have the function of version control and alias likes Labmda.lg...
I tried to develop the data orchestlation with s3, Glue Job and Glue Workflow. After I developed it, I found that Glue Job and Glue Workflow doesn't have the function of version control and alias...
0
answers
0
votes
190
views
asked 2 months agolg...
Hi team, first post, let me know if it provides a good explanation.
I'd like to know a way to minimize the effort for data ingestion.
We have two options as follows:
(1) csv files from a file...
0
answers
0
votes
312
views
asked 2 months agolg...
I am running an EMR cluster with an attached notebook, and using Apache spark to load/process data however I have not been able to load data into Apache. Whenever I try to run...
1
answers
0
votes
349
views
asked 2 months agolg...
Hello!
I am new to AWS Glue and I starting to create data monitoring rules in AWS Glue. I have tried multiple options with CustomSQL but can not seem to find the solution.
My problem: I want to check...
1
answers
0
votes
177
views
asked 2 months agolg...
Getting error while connecting streaming data from kinesis to redshift with few transformations using visual ETL. (using amazon kinesis - glue data catalog table as source ). Schema is already...
1
answers
0
votes
228
views
asked 2 months agolg...
We are using Tableau and Tableau has a schedule querying athena.
It worked well until yesterday but I got below issue today.
> HIVE_CANNOT_OPEN_SPLIT: Error opening Hive split...
1
answers
0
votes
364
views
asked 2 months agolg...