Lake Formation table read performance

0

I'm evaluating Lake Formation and facing abysmally slow table reading numbers. For example, both wr.lakeformation.read_sql_table and wr.athena.read_sql_table need 50-60 sec to read a small 104x1000 table. For comparison, direct S3 parquet read wr.s3.read_parquet need only 3-4 sec to read the same table.

Are those numbers make any sense? Can I optimise Lake Formation read performance...

The performance test code is at https://gist.github.com/staskh/f25b52f97f6775d96992f9785b0e2019

gefragt vor einem Jahr286 Aufrufe
1 Antwort
0
Akzeptierte Antwort

In order to optimize performance, it is typically necessary to analyze the specific details of the data and query on a case-by-case basis. Identifying the performance bottlenecks is crucial before providing any optimization recommendations. Each scenario may have unique factors that contribute to performance issues, and a thorough assessment is necessary to determine the most effective optimization strategies. To better answer your question, we require details that are non-public information. Please open a support case with AWS using the following link.

Based on the information provided, it could make sense that the Athena query took longer time than a local query for smaller datasets. When querying through Athena, the query execution occurs remotely, involving additional steps and API calls. Additionally, Athena is designed to optimize performance for large-scale data scenarios by utilizing distributed computing, which introduces additional overhead compared to local processing on a single node. This overhead can become much more noticeable when working with smaller datasets.

AWS
JJ_L
beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen