How fast can glue ETL convert data to parquet?

0

Do we have any benchmark number onhow fast glue ETL convert data to parquet?
like 1 DPU can process 1GB raw data in X minutes

I want to get a baseline number so I can get idea if the ETL job runs normal or has problem.
also to estimate the DPUs I should use for my data conversion task.

Thanks

중재자
질문됨 5년 전593회 조회
1개 답변
0
수락된 답변

It really depends on how your data is structured. If it's 1 GB file, then it's going to not benefit from Glue being able to fan out. If it's 1024 1MB files, then you're going to see the benefits. Also, it will depend on the block size of the Parquet to allow for optimal I/O (See tip #5 here https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/).

I could only find some information on how to tune your DPUs optimally. The example given was 428 Gzipped JSON files converting to parquet files.

https://docs.aws.amazon.com/glue/latest/dg/monitor-debug-capacity.html

AWS
답변함 5년 전
profile picture
전문가
검토됨 한 달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠