Questions tagged with Extract Transform & Load Data
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
# Error while running UNLOAD to PARQUET query using column names with spaces in
## Introduction
I have a table in Athena with the following column names ["column space 1", "column space 2"]. I...
1
answers
0
votes
604
views
asked 2 months agolg...
I set up a replication task with AWS Database Migration Service to implement full load + CDC from a RDS instance to a S3 bucket. Since I want to use Athena to query the data in S3, I set the option...
2
answers
0
votes
201
views
asked 2 months agolg...
In Amazon Redshift, the general syntax for creating a procedure is as follows:
```
CREATE [ OR REPLACE ] PROCEDURE sp_procedure_name
( [ [ argname ] [ argmode ] argtype [, ...] ] )
[ NONATOMIC...
2
answers
0
votes
332
views
asked 2 months agolg...
I have defined two tables:
```
CREATE EXTERNAL TABLE `event_data`(
`systemid` string COMMENT 'from deserializer',
`eventtime` string COMMENT 'from deserializer',
`eventtype` string COMMENT...
1
answers
0
votes
560
views
asked 2 months agolg...
I need to replicate an iceberg datalake stored in S3 from one bucket to another. However, multi-region access point doesn't work with Athena table. And I don't see any pyspark procedure that could...
1
answers
0
votes
381
views
asked 2 months agolg...
Hello everyone,
I created a spark_ready.py module that hosts multiple classes that I want to use as a template. I've seen in multiple configurations online that using the "spark.submit.pyFiles" will...
2
answers
0
votes
460
views
asked 2 months agolg...
I need to fetch files that has arrived current_time - 1hr from my S3 bucket for processing. My files name will be in format yyyymmdd-hhmmsssss.parquet (includes milli seconds also). So I am running a...
1
answers
0
votes
411
views
asked 2 months agolg...
I have an iceberg table defined like this:
CREATE TABLE IF NOT EXISTS staging (
id STRING,
staging_timestamp BIGINT,
... blah blah blah ...
)
PARTITIONED BY...
0
answers
0
votes
173
views
asked 2 months agolg...
Hello team,
So, I built an ETL in python using pyspark. I have a bastion EC2 mysql database that is a copy of a production environment.
Every day it is copying the prod at round 2 oclock, and my...
1
answers
0
votes
194
views
asked 2 months agolg...
Hello! According to the [documentation](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect-kinesis-home.html), it should be possible to write data to Kinesis from Glue...
2
answers
0
votes
1107
views
asked 2 months agolg...
I have a glue job (job_a) that starts through a Lambda. When a file is placed inside an S3 bucket, I am triggering a glue job (job_a) through Lambda. My requirement is, once this glue job (job_a), is...
1
answers
0
votes
336
views
asked 2 months agolg...
I am interested particularly in `%additional_python_modules` and I always get this error:
`UsageError: Line magic function `%additional_python_modules` not found.`
The same error is thrown when I...
2
answers
0
votes
127
views
asked 2 months agolg...