Does Athena reuse PostgreSQL internal indexes?

1

Hello, I've setup Athena to query my PostgreSQL database and it works fine. My question is whether it leverages the existing indexes in my tables for improved performance?

Mehdi
已提问 9 个月前247 查看次数
2 回答
0
已接受的回答

Hello there,

Based on the query description, I assume that you are querying PostgreSQL Database using Amazon Athena Federated query which supports PostgreSQL as data source.

The Amazon Athena PostgreSQL connector enables Athena to access your PostgreSQL databases. Athena uses data source connectors that run on AWS Lambda to run federated queries. A data source connector is a piece of code that can translate between your target data source and Athena. You can think of a connector as an extension of Athena's query engine. With PostgreSQL as data source, the Lambda function data source connector is deployed and the Lambda function is connected using JDBC driver to the PostgreSQL data source .

When a query is submitted against a data source, Athena invokes the corresponding connector to identify parts of the tables that need to be read, manages parallelism, and pushes down filter predicates.

The Athena PostgreSQL connector performs predicate pushdown to decrease the data scanned by the query. LIMIT clauses, simple predicates, and complex expressions are pushed down to the connector to reduce the amount of data scanned and decrease query execution run time. However, selecting a subset of columns sometimes results in a longer query execution runtime.

[+] https://docs.aws.amazon.com/athena/latest/ug/connectors-postgresql.html#connectors-postgresql-performance

Now to answer your query, if PostgreSQL indexes are used when querying the DB from Amazon Athena. The Query running from the Amazon Athena on PostgreSQL data source would execute the statement on the PostgreSQL database as any SQL and hence, the native indexes would be used by PostgreSQL engine irrespective of from where the query is executed, whether it is from a JDBC client or from Amazon Athena Query editor.

I have tested this behavior in my lab wherein I have setup Athena with PostgreSQL data source and queried the Table with Where clause defined for the column which had index. I have configured auto_explain for the RDS PostgreSQL to log the explain plans of SQL queries executed on the DB and hence, I was able to verify the Explain plan generated by PostgreSQL for the SQL executed by Athena which confirmed that index scan was performed using the index available for the column queried in SQL.

You can perform similar test to verify the same at your end by enabling auto_explain in the shared_preload_libraries of PostgreSQL DB Parameter group and reviewing the Explain plan of the query that is executed from Amazon Athena. If you are using RDS as PostgreSQL DB, you can follow below document to log execution plans of SQL.

[+] https://repost.aws/knowledge-center/rds-postgresql-tune-query-performance

Hope the above information answers your concern. Thank you!

AWS
支持工程师
已回答 9 个月前
0

PostgreSQL supports native partitions. The Athena PostgreSQL connector can retrieve data from these partitions in parallel. If you want to query very large datasets with uniform partition distribution, native partitioning is highly recommended.

The Athena PostgreSQL connector performs predicate pushdown to decrease the data scanned by the query. LIMIT clauses, simple predicates, and complex expressions are pushed down to the connector to reduce the amount of data scanned and decrease query execution run time. However, selecting a subset of columns sometimes results in a longer query execution runtime.

Amazon Athena does not directly utilize existing indexes from your PostgreSQL database for query performance. It's a serverless query service that allows you to run SQL queries on data stored in Amazon S3 using the Presto query engine.

If you want to leverage indexes for improved query performance, you would typically need to create appropriate indexes directly on the data stored in Amazon S3. This could involve using partitioning and columnar storage formats like Parquet or ORC, which can significantly improve query performance in Athena.

Instead, Athena optimizes query performance by utilizing a technique known as predicate pushdown. This means that Athena's query engine will try to apply filters as early as possible in the query processing, reducing the amount of data that needs to be read from S3.

References

[1] https://docs.aws.amazon.com/athena/latest/ug/connectors-postgresql.html#connectors-postgresql-performance

[2] https://docs.aws.amazon.com/athena/latest/ug/athena-explain-statement.html#athena-explain-statement

[3] Third-party link - https://www.postgresqltutorial.com/postgresql-indexes/postgresql-create-index/

profile picture
专家
已回答 9 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则