Questions tagged with Extract Transform & Load Data
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
Reading few gb say 15gb of parquet skewed data , after few transformation such as data type change for some columns and then doing repartitions (dataframe.repartition(120)) before writing it to s3 in...
1
answers
0
votes
297
views
asked 2 months agolg...
I have a glue job which pushes the data from glue into open search.
The index Id column is automatically created while inserting the data into open search.
I would like to pass the index id _id...
1
answers
0
votes
348
views
asked 2 months agolg...
How can I "automatically" add new partitions to a Glue table based on a Hive formatted S3 bucket?lg...
I have a Bucket containing AWS AppStream logs on format `s3://appstream-logs.../sessions/schedule=DAILY/year=2024/month=04/day=03/daily-session-report-2024-04-03.csv`. I have made this data available...
2
answers
0
votes
141
views
asked 2 months agolg...
I am working on migrating data from MySQL to S3 using AWS DMS. I want to employ wildcard mapping for the schema name in the DMS task's selection rules. Specifically, I aim to include tables from...
6
answers
0
votes
197
views
asked 2 months agolg...
We are encountering a issue where we're utilizing the "super" datatype. The column in the Parquet file we receive has a maximum length of 192K. How should we handle this data? Are there alternative...
2
answers
0
votes
272
views
asked 2 months agolg...
Example s3://bucket1/mytable/ -- > east-2 bucket folder with same schema
s3://bucket2/mytable/ -- > west-2 bucket folder with same schema
can we create a single table from this two...
3
answers
0
votes
556
views
asked 2 months agolg...
# Error while running UNLOAD to PARQUET query using column names with spaces in
## Introduction
I have a table in Athena with the following column names ["column space 1", "column space 2"]. I...
1
answers
0
votes
624
views
asked 2 months agolg...
I set up a replication task with AWS Database Migration Service to implement full load + CDC from a RDS instance to a S3 bucket. Since I want to use Athena to query the data in S3, I set the option...
2
answers
0
votes
226
views
asked 2 months agolg...
In Amazon Redshift, the general syntax for creating a procedure is as follows:
```
CREATE [ OR REPLACE ] PROCEDURE sp_procedure_name
( [ [ argname ] [ argmode ] argtype [, ...] ] )
[ NONATOMIC...
2
answers
0
votes
343
views
asked 2 months agolg...
I have defined two tables:
```
CREATE EXTERNAL TABLE `event_data`(
`systemid` string COMMENT 'from deserializer',
`eventtime` string COMMENT 'from deserializer',
`eventtype` string COMMENT...
1
answers
0
votes
571
views
asked 2 months agolg...
I need to replicate an iceberg datalake stored in S3 from one bucket to another. However, multi-region access point doesn't work with Athena table. And I don't see any pyspark procedure that could...
1
answers
0
votes
424
views
asked 3 months agolg...
Hello everyone,
I created a spark_ready.py module that hosts multiple classes that I want to use as a template. I've seen in multiple configurations online that using the "spark.submit.pyFiles" will...
2
answers
0
votes
517
views
asked 3 months agolg...