Questions tagged with Extract Transform & Load Data

Content language: English

Select up to 5 tags to filter

Sort by most recent

Filter Questions by

AllAnsweredUnansweredNo Answer

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Redshift Procedure System Catalog

In Amazon Redshift, the general syntax for creating a procedure is as follows: ``` CREATE [ OR REPLACE ] PROCEDURE sp_procedure_name ( [ [ argname ] [ argmode ] argtype [, ...] ] ) [ NONATOMIC...

Accepted AnswerAnalytics Database Extract Transform & Load Data Amazon Redshift

answers

votes

302

views

dashline

asked a month ago

In this simple test, why does Athena fail to prune partitions?

I have defined two tables: ``` CREATE EXTERNAL TABLE `event_data`( `systemid` string COMMENT 'from deserializer', `eventtime` string COMMENT 'from deserializer', `eventtype` string COMMENT...

Amazon Athena Extract Transform & Load Data

answers

votes

536

views

AlexR

asked a month ago

How do I replicate an Iceberg table used with Athena SQL and Athena PySpark?

I need to replicate an iceberg datalake stored in S3 from one bucket to another. However, multi-region access point doesn't work with Athena table. And I don't see any pyspark procedure that could...

Amazon Simple Storage Service Amazon Athena AWS Glue Extract Transform & Load Data

answers

votes

272

views

DarkCenobyte

asked a month ago

Import Custom Python Modules on EMR Serverless through Spark Configuration

Hello everyone, I created a spark_ready.py module that hosts multiple classes that I want to use as a template. I've seen in multiple configurations online that using the "spark.submit.pyFiles" will...

Serverless Extract Transform & Load Data Amazon EMR Serverless

answers

votes

282

views

Justine

asked a month ago

current_time minus 1hr in Glue Pyspark

I need to fetch files that has arrived current_time - 1hr from my S3 bucket for processing. My files name will be in format yyyymmdd-hhmmsssss.parquet (includes milli seconds also). So I am running a...

Accepted AnswerAmazon Simple Storage Service Analytics AWS Glue Extract Transform & Load Data

answers

votes

381

views

Joe

asked a month ago

Athena Iceberg creates 100,000 files where just a few dozen were expected

I have an iceberg table defined like this: CREATE TABLE IF NOT EXISTS staging ( id STRING, staging_timestamp BIGINT, ... blah blah blah ... ) PARTITIONED BY...

Amazon Athena Extract Transform & Load Data

answers

votes

159

views

AlexR

asked a month ago

AWS Glue & pyspark : How to improve performance on a medium to big scaled table

Hello team, So, I built an ETL in python using pyspark. I have a bastion EC2 mysql database that is a copy of a production environment. Every day it is copying the prod at round 2 oclock, and my...

Accepted AnswerDatabase AWS Glue Extract Transform & Load Data

answers

votes

152

views

Ted

asked a month ago

How to write data from Glue ETL Streaming Job to Kinesis Data Stream?

Hello! According to the [documentation](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect-kinesis-home.html), it should be possible to write data to Kinesis from Glue...

AWS Glue Extract Transform & Load Data Amazon Kinesis Data Streams Amazon Kinesis

answers

votes

669

views

Wojtek1902

asked a month ago

AWS Glue Workflow Trigger

I have a glue job (job_a) that starts through a Lambda. When a file is placed inside an S3 bucket, I am triggering a glue job (job_a) through Lambda. My requirement is, once this glue job (job_a), is...

AWS Lambda AWS CloudFormation Amazon EventBridge AWS Glue Extract Transform & Load Data

answers

votes

286

views

Joe

asked a month ago

Line magics in Glue Docker container not found

I am interested particularly in `%additional_python_modules` and I always get this error: `UsageError: Line magic function `%additional_python_modules` not found.` The same error is thrown when I...

Accepted AnswerAWS Glue Extract Transform & Load Data

answers

votes

views

siyala

asked a month ago

Create linage in DataHub from Glue job

I am running a PoC around integrating the Glue lineage into the [DataHub](https://datahubproject.io/). I have based my research on this set of AWS blog posts...

Analytics AWS Glue Extract Transform & Load Data

answers

votes

490

views

Denys

asked 2 months ago

AWS glue studio node long run time for data preview

Hi, I am using AWS glue studio to read from a DDB table with direct DDB connection. So far my visual diagram has two nodes: 1. Source DDB table node -> Here preview takes 5 minutes for only 2 rows of...

AWS Glue Amazon DynamoDB Extract Transform & Load Data Amazon Redshift

answers

votes

168

views

aws-user-4575

asked 2 months ago

1
2
3
4
5
•••
51
12 / page