Questions tagged with Extract Transform & Load Data
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
In Amazon Redshift, the general syntax for creating a procedure is as follows:
```
CREATE [ OR REPLACE ] PROCEDURE sp_procedure_name
( [ [ argname ] [ argmode ] argtype [, ...] ] )
[ NONATOMIC...
2
answers
0
votes
329
views
asked 2 months agolg...
I have defined two tables:
```
CREATE EXTERNAL TABLE `event_data`(
`systemid` string COMMENT 'from deserializer',
`eventtime` string COMMENT 'from deserializer',
`eventtype` string COMMENT...
1
answers
0
votes
555
views
asked 2 months agolg...
I need to replicate an iceberg datalake stored in S3 from one bucket to another. However, multi-region access point doesn't work with Athena table. And I don't see any pyspark procedure that could...
1
answers
0
votes
360
views
asked 2 months agolg...
Hello everyone,
I created a spark_ready.py module that hosts multiple classes that I want to use as a template. I've seen in multiple configurations online that using the "spark.submit.pyFiles" will...
2
answers
0
votes
434
views
asked 2 months agolg...
I need to fetch files that has arrived current_time - 1hr from my S3 bucket for processing. My files name will be in format yyyymmdd-hhmmsssss.parquet (includes milli seconds also). So I am running a...
1
answers
0
votes
407
views
asked 2 months agolg...
I have an iceberg table defined like this:
CREATE TABLE IF NOT EXISTS staging (
id STRING,
staging_timestamp BIGINT,
... blah blah blah ...
)
PARTITIONED BY...
0
answers
0
votes
170
views
asked 2 months agolg...
Hello team,
So, I built an ETL in python using pyspark. I have a bastion EC2 mysql database that is a copy of a production environment.
Every day it is copying the prod at round 2 oclock, and my...
1
answers
0
votes
186
views
asked 2 months agolg...
Hello! According to the [documentation](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect-kinesis-home.html), it should be possible to write data to Kinesis from Glue...
2
answers
0
votes
1017
views
asked 2 months agolg...
I have a glue job (job_a) that starts through a Lambda. When a file is placed inside an S3 bucket, I am triggering a glue job (job_a) through Lambda. My requirement is, once this glue job (job_a), is...
1
answers
0
votes
322
views
asked 2 months agolg...
I am interested particularly in `%additional_python_modules` and I always get this error:
`UsageError: Line magic function `%additional_python_modules` not found.`
The same error is thrown when I...
2
answers
0
votes
122
views
asked 2 months agolg...
I am running a PoC around integrating the Glue lineage into the [DataHub](https://datahubproject.io/). I have based my research on this set of AWS blog posts...
1
answers
0
votes
537
views
asked 2 months agolg...
Hi, I am using AWS glue studio to read from a DDB table with direct DDB connection. So far my visual diagram has two nodes:
1. Source DDB table node -> Here preview takes 5 minutes for only 2 rows of...
1
answers
0
votes
221
views
asked 2 months agolg...