Questions tagged with Extract Transform & Load Data

Content language: English

Select up to 5 tags to filter

Sort by most recent

Filter Questions by

AllAnsweredUnansweredNo Answer

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Spark shuffle huge amount of data even read data is not huge

Reading few gb say 15gb of parquet skewed data , after few transformation such as data type change for some columns and then doing repartitions (dataframe.repartition(120)) before writing it to s3 in...

AWS Glue Extract Transform & Load Data Amazon GameSparks S3 Select

answers

votes

277

views

Bibhu

asked a month ago

AWS GLUE to Open Search Index custom Index Id

I have a glue job which pushes the data from glue into open search. The index Id column is automatically created while inserting the data into open search. I would like to pass the index id _id...

Accepted AnswerAnalytics AWS Glue Amazon OpenSearch Service Extract Transform & Load Data

answers

votes

287

views

srm

asked a month ago

How can I "automatically" add new partitions to a Glue table based on a Hive formatted S3 bucket?

I have a Bucket containing AWS AppStream logs on format `s3://appstream-logs.../sessions/schedule=DAILY/year=2024/month=04/day=03/daily-session-report-2024-04-03.csv`. I have made this data available...

Accepted AnswerAWS Glue Extract Transform & Load Data

answers

votes

118

views

Andreax

asked a month ago

AWS DMS Migration Task Fails with "No Tables Found" Using Wildcard in Schema Mapping for MySQL

I am working on migrating data from MySQL to S3 using AWS DMS. I want to employ wildcard mapping for the schema name in the DMS task's selection rules. Specifically, I aim to include tables from...

AWS Database Migration Service AWS Glue Extract Transform & Load Data

answers

votes

175

views

Bhavesh

asked a month ago

Redshift super datatype not enough to store json data type column from Postgres

We are encountering a issue where we're utilizing the "super" datatype. The column in the Parquet file we receive has a maximum length of 192K. How should we handle this data? Are there alternative...

AWS Glue Extract Transform & Load Data Amazon Redshift

answers

votes

228

views

msve

asked a month ago

is it possible to create a single table from same schema data but folders are present in different region S3 buckets?

Example s3://bucket1/mytable/ -- > east-2 bucket folder with same schema s3://bucket2/mytable/ -- > west-2 bucket folder with same schema can we create a single table from this two...

Amazon Simple Storage Service Amazon Athena AWS Glue Storage Extract Transform & Load Data

answers

votes

534

views

asked a month ago

Athena error while running UNLOAD to PARQUET query using column names with spaces in -- GENERIC_INTERNAL_ERROR: field ended by ';': expected ';'

# Error while running UNLOAD to PARQUET query using column names with spaces in ## Introduction I have a table in Athena with the following column names ["column space 1", "column space 2"]. I...

Amazon Athena AWS Glue Extract Transform & Load Data Amazon Redshift

answers

votes

591

views

toby

asked a month ago

Glue Data Catalog configuration when updating with Database Migration Service

I set up a replication task with AWS Database Migration Service to implement full load + CDC from a RDS instance to a S3 bucket. Since I want to use Athena to query the data in S3, I set the option...

Accepted AnswerAWS Database Migration Service AWS Glue Extract Transform & Load Data

answers

votes

185

views

Simona B

asked a month ago

Redshift Procedure System Catalog

In Amazon Redshift, the general syntax for creating a procedure is as follows: ``` CREATE [ OR REPLACE ] PROCEDURE sp_procedure_name ( [ [ argname ] [ argmode ] argtype [, ...] ] ) [ NONATOMIC...

Accepted AnswerAnalytics Database Extract Transform & Load Data Amazon Redshift

answers

votes

326

views

dashline

asked 2 months ago

In this simple test, why does Athena fail to prune partitions?

I have defined two tables: ``` CREATE EXTERNAL TABLE `event_data`( `systemid` string COMMENT 'from deserializer', `eventtime` string COMMENT 'from deserializer', `eventtype` string COMMENT...

Amazon Athena Extract Transform & Load Data

answers

votes

553

views

AlexR

asked 2 months ago

How do I replicate an Iceberg table used with Athena SQL and Athena PySpark?

I need to replicate an iceberg datalake stored in S3 from one bucket to another. However, multi-region access point doesn't work with Athena table. And I don't see any pyspark procedure that could...

Amazon Simple Storage Service Amazon Athena AWS Glue Extract Transform & Load Data

answers

votes

341

views

DarkCenobyte

asked 2 months ago

Import Custom Python Modules on EMR Serverless through Spark Configuration

Hello everyone, I created a spark_ready.py module that hosts multiple classes that I want to use as a template. I've seen in multiple configurations online that using the "spark.submit.pyFiles" will...

Serverless Extract Transform & Load Data Amazon EMR Serverless

answers

votes

409

views

Justine

asked 2 months ago

1
2
3
4
5
•••
52
12 / page