How fast can glue ETL convert data to parquet?

0

Do we have any benchmark number onhow fast glue ETL convert data to parquet?
like 1 DPU can process 1GB raw data in X minutes

I want to get a baseline number so I can get idea if the ETL job runs normal or has problem.
also to estimate the DPUs I should use for my data conversion task.

Thanks

MODERADOR
preguntada hace 5 años593 visualizaciones
1 Respuesta
0
Respuesta aceptada

It really depends on how your data is structured. If it's 1 GB file, then it's going to not benefit from Glue being able to fan out. If it's 1024 1MB files, then you're going to see the benefits. Also, it will depend on the block size of the Parquet to allow for optimal I/O (See tip #5 here https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/).

I could only find some information on how to tune your DPUs optimally. The example given was 428 Gzipped JSON files converting to parquet files.

https://docs.aws.amazon.com/glue/latest/dg/monitor-debug-capacity.html

AWS
respondido hace 5 años
profile picture
EXPERTO
revisado hace un mes

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas