Questions tagged with Extract Transform & Load Data
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
Reading few gb say 15gb of parquet skewed data , after few transformation such as data type change for some columns and then doing repartitions (dataframe.repartition(120)) before writing it to s3 in...
1
answers
0
votes
277
views
asked a month agolg...
I have a glue job which pushes the data from glue into open search.
The index Id column is automatically created while inserting the data into open search.
I would like to pass the index id _id...
1
answers
0
votes
287
views
asked a month agolg...
How can I "automatically" add new partitions to a Glue table based on a Hive formatted S3 bucket?lg...
I have a Bucket containing AWS AppStream logs on format `s3://appstream-logs.../sessions/schedule=DAILY/year=2024/month=04/day=03/daily-session-report-2024-04-03.csv`. I have made this data available...
2
answers
0
votes
118
views
asked a month agolg...
I am working on migrating data from MySQL to S3 using AWS DMS. I want to employ wildcard mapping for the schema name in the DMS task's selection rules. Specifically, I aim to include tables from...
6
answers
0
votes
175
views
asked a month agolg...
We are encountering a issue where we're utilizing the "super" datatype. The column in the Parquet file we receive has a maximum length of 192K. How should we handle this data? Are there alternative...
1
answers
0
votes
228
views
asked a month agolg...
Example s3://bucket1/mytable/ -- > east-2 bucket folder with same schema
s3://bucket2/mytable/ -- > west-2 bucket folder with same schema
can we create a single table from this two...
3
answers
0
votes
534
views
asked a month agolg...
# Error while running UNLOAD to PARQUET query using column names with spaces in
## Introduction
I have a table in Athena with the following column names ["column space 1", "column space 2"]. I...
1
answers
0
votes
591
views
asked a month agolg...
I set up a replication task with AWS Database Migration Service to implement full load + CDC from a RDS instance to a S3 bucket. Since I want to use Athena to query the data in S3, I set the option...
2
answers
0
votes
185
views
asked a month agolg...
In Amazon Redshift, the general syntax for creating a procedure is as follows:
```
CREATE [ OR REPLACE ] PROCEDURE sp_procedure_name
( [ [ argname ] [ argmode ] argtype [, ...] ] )
[ NONATOMIC...
2
answers
0
votes
326
views
asked 2 months agolg...
I have defined two tables:
```
CREATE EXTERNAL TABLE `event_data`(
`systemid` string COMMENT 'from deserializer',
`eventtime` string COMMENT 'from deserializer',
`eventtype` string COMMENT...
1
answers
0
votes
553
views
asked 2 months agolg...
I need to replicate an iceberg datalake stored in S3 from one bucket to another. However, multi-region access point doesn't work with Athena table. And I don't see any pyspark procedure that could...
1
answers
0
votes
341
views
asked 2 months agolg...
Hello everyone,
I created a spark_ready.py module that hosts multiple classes that I want to use as a template. I've seen in multiple configurations online that using the "spark.submit.pyFiles" will...
2
answers
0
votes
409
views
asked 2 months agolg...