Lake Formation table read performance

0

I'm evaluating Lake Formation and facing abysmally slow table reading numbers. For example, both wr.lakeformation.read_sql_table and wr.athena.read_sql_table need 50-60 sec to read a small 104x1000 table. For comparison, direct S3 parquet read wr.s3.read_parquet need only 3-4 sec to read the same table.

Are those numbers make any sense? Can I optimise Lake Formation read performance...

The performance test code is at https://gist.github.com/staskh/f25b52f97f6775d96992f9785b0e2019

已提问 1 年前286 查看次数
1 回答
0
已接受的回答

In order to optimize performance, it is typically necessary to analyze the specific details of the data and query on a case-by-case basis. Identifying the performance bottlenecks is crucial before providing any optimization recommendations. Each scenario may have unique factors that contribute to performance issues, and a thorough assessment is necessary to determine the most effective optimization strategies. To better answer your question, we require details that are non-public information. Please open a support case with AWS using the following link.

Based on the information provided, it could make sense that the Athena query took longer time than a local query for smaller datasets. When querying through Athena, the query execution occurs remotely, involving additional steps and API calls. Additionally, Athena is designed to optimize performance for large-scale data scenarios by utilizing distributed computing, which introduces additional overhead compared to local processing on a single node. This overhead can become much more noticeable when working with smaller datasets.

AWS
JJ_L
已回答 1 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则