How fast can glue ETL convert data to parquet?

0

Do we have any benchmark number onhow fast glue ETL convert data to parquet?
like 1 DPU can process 1GB raw data in X minutes

I want to get a baseline number so I can get idea if the ETL job runs normal or has problem.
also to estimate the DPUs I should use for my data conversion task.

Thanks

MODERATOR
asked 5 years ago586 views
1 Answer
0
Accepted Answer

It really depends on how your data is structured. If it's 1 GB file, then it's going to not benefit from Glue being able to fan out. If it's 1024 1MB files, then you're going to see the benefits. Also, it will depend on the block size of the Parquet to allow for optimal I/O (See tip #5 here https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/).

I could only find some information on how to tune your DPUs optimally. The example given was 428 Gzipped JSON files converting to parquet files.

https://docs.aws.amazon.com/glue/latest/dg/monitor-debug-capacity.html

AWS
answered 5 years ago
profile picture
EXPERT
reviewed 24 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions