Questions tagged with Extract Transform & Load Data
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
I have defined two tables:
```
CREATE EXTERNAL TABLE `event_data`(
`systemid` string COMMENT 'from deserializer',
`eventtime` string COMMENT 'from deserializer',
`eventtype` string COMMENT...
1
answers
0
votes
585
views
asked 3 months agolg...
I need to replicate an iceberg datalake stored in S3 from one bucket to another. However, multi-region access point doesn't work with Athena table. And I don't see any pyspark procedure that could...
1
answers
0
votes
467
views
asked 3 months agolg...
Hello everyone,
I created a spark_ready.py module that hosts multiple classes that I want to use as a template. I've seen in multiple configurations online that using the "spark.submit.pyFiles" will...
2
answers
0
votes
587
views
asked 3 months agolg...
I need to fetch files that has arrived current_time - 1hr from my S3 bucket for processing. My files name will be in format yyyymmdd-hhmmsssss.parquet (includes milli seconds also). So I am running a...
1
answers
0
votes
449
views
asked 3 months agolg...
I have an iceberg table defined like this:
CREATE TABLE IF NOT EXISTS staging (
id STRING,
staging_timestamp BIGINT,
... blah blah blah ...
)
PARTITIONED BY...
0
answers
0
votes
184
views
asked 3 months agolg...
Hello team,
So, I built an ETL in python using pyspark. I have a bastion EC2 mysql database that is a copy of a production environment.
Every day it is copying the prod at round 2 oclock, and my...
1
answers
0
votes
265
views
asked 3 months agolg...
Hello! According to the [documentation](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect-kinesis-home.html), it should be possible to write data to Kinesis from Glue...
2
answers
0
votes
1614
views
asked 3 months agolg...
I have a glue job (job_a) that starts through a Lambda. When a file is placed inside an S3 bucket, I am triggering a glue job (job_a) through Lambda. My requirement is, once this glue job (job_a), is...
1
answers
0
votes
380
views
asked 3 months agolg...
I am interested particularly in `%additional_python_modules` and I always get this error:
`UsageError: Line magic function `%additional_python_modules` not found.`
The same error is thrown when I...
2
answers
0
votes
160
views
asked 3 months agolg...
I am running a PoC around integrating the Glue lineage into the [DataHub](https://datahubproject.io/). I have based my research on this set of AWS blog posts...
1
answers
0
votes
595
views
asked 4 months agolg...
Hi, I am using AWS glue studio to read from a DDB table with direct DDB connection. So far my visual diagram has two nodes:
1. Source DDB table node -> Here preview takes 5 minutes for only 2 rows of...
1
answers
0
votes
293
views
asked 4 months agolg...
Is it possible to wildcard the include path for a MongoDB crawler. I've tried a number of different options similar to the options available for JDBC and other relational database connections, but...
1
answers
0
votes
151
views
asked 4 months agolg...