How fast can glue ETL convert data to parquet?

0

Do we have any benchmark number onhow fast glue ETL convert data to parquet?
like 1 DPU can process 1GB raw data in X minutes

I want to get a baseline number so I can get idea if the ETL job runs normal or has problem.
also to estimate the DPUs I should use for my data conversion task.

Thanks

管理員
已提問 5 年前檢視次數 593 次
1 個回答
0
已接受的答案

It really depends on how your data is structured. If it's 1 GB file, then it's going to not benefit from Glue being able to fan out. If it's 1024 1MB files, then you're going to see the benefits. Also, it will depend on the block size of the Parquet to allow for optimal I/O (See tip #5 here https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/).

I could only find some information on how to tune your DPUs optimally. The example given was 428 Gzipped JSON files converting to parquet files.

https://docs.aws.amazon.com/glue/latest/dg/monitor-debug-capacity.html

AWS
已回答 5 年前
profile picture
專家
已審閱 1 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南