Questions tagged with Extract Transform & Load Data
Content language: English
Select up to 5 tags to filter
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
I have JSON data stored on S3 which I have created Glue tables over. This data is partitioned and I use Glue crawlers to update the table partitions. I then load this data as a Glue DynamicFrame...
1
answers
1
votes
481
views
asked a year agolg...
I have an AWS Glue PII data detector job, its taking around 47 minutes to complete for 17.9 MB file size which is very long time for any spark job.
Sharing the code snippet used in the...
1
answers
0
votes
440
views
asked a year agolg...
I have ran the python query to transform the json format to parquet format and it was completed successfully, I can see the parquet file and the columns, but when I try to run the query using Athena,...
2
answers
0
votes
821
views
asked a year agolg...
New to AWS, trying to create a table through glue crawler from a .json file that i uploaded into S3.lg...
Hello, any help would be much appreciated. I have two files that I need to make tables for one is a csv file that I was able to get the table loaded for through glue crawler. The other file i was not...
1
answers
0
votes
1122
views
asked a year agolg...
We have a number of saved report templates and when we look at the generated csv files, there are a few problems.
1. the columns change order randomly
2. in the csv some of the column headers (such...
2
answers
0
votes
261
views
asked a year agolg...
I am using Amazon Kinesis Firehose for converting files from JSON to Parquet leveraging Glue for Table creation.
When the data is blank the glue schema creates a NULL and the conversion at Kinesis...
1
answers
0
votes
2149
views
asked a year agolg...
Hi,
I'm trying to run a python job on EMR with some dependencies installed with venv as following
```
python -m venv pyspark_venv
source pyspark_venv/bin/activate
pip install pyarrow pandas...
1
answers
0
votes
876
views
asked a year agolg...
Hey all! Wondering if anybody has some experience what the best way is to ETL data from Influx2 to my AWS S3 Data lake. I have been looking for influx2 jdbc connectors (I have used these for PG) but...
0
answers
0
votes
48
views
asked a year agolg...
Hey! I have a setup currently with a crawler that connects to a PostgreSQL database with JDBC, this works and the crawler generates around 20 tables for this database.
I now want to create an ETL job...
1
answers
0
votes
938
views
asked a year agolg...
I want to extract data from my PostgreSQL on RDS using Aws data glue, transform the data and export the data to s3 bucket. how do i do that. i need an AWS tutorial on this.
2
answers
0
votes
1086
views
asked a year agolg...
I am writing a glue script to take data from s3(PSQL WAL LOGS) to write that data into a hudi data lake.
Whenever I am trying to do that I am getting unable to upsert data with commit time error with...
1
answers
0
votes
744
views
asked a year agolg...
```
df.toDF().write.format("jdbc").\
option("url", "").\
option("dbtable", f"public.{tableName}_staging").\
option("user", "").\
option("password", "").\
...
1
answers
0
votes
353
views
asked a year agolg...