Lake Formation table read performance

0

I'm evaluating Lake Formation and facing abysmally slow table reading numbers. For example, both wr.lakeformation.read_sql_table and wr.athena.read_sql_table need 50-60 sec to read a small 104x1000 table. For comparison, direct S3 parquet read wr.s3.read_parquet need only 3-4 sec to read the same table.

Are those numbers make any sense? Can I optimise Lake Formation read performance...

The performance test code is at https://gist.github.com/staskh/f25b52f97f6775d96992f9785b0e2019

preguntada hace un año286 visualizaciones
1 Respuesta
0
Respuesta aceptada

In order to optimize performance, it is typically necessary to analyze the specific details of the data and query on a case-by-case basis. Identifying the performance bottlenecks is crucial before providing any optimization recommendations. Each scenario may have unique factors that contribute to performance issues, and a thorough assessment is necessary to determine the most effective optimization strategies. To better answer your question, we require details that are non-public information. Please open a support case with AWS using the following link.

Based on the information provided, it could make sense that the Athena query took longer time than a local query for smaller datasets. When querying through Athena, the query execution occurs remotely, involving additional steps and API calls. Additionally, Athena is designed to optimize performance for large-scale data scenarios by utilizing distributed computing, which introduces additional overhead compared to local processing on a single node. This overhead can become much more noticeable when working with smaller datasets.

AWS
JJ_L
respondido hace un año

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas