Does Athena reuse PostgreSQL internal indexes?

1

Hello, I've setup Athena to query my PostgreSQL database and it works fine. My question is whether it leverages the existing indexes in my tables for improved performance?

Mehdi
asked 9 months ago233 views
2 Answers
0
Accepted Answer

Hello there,

Based on the query description, I assume that you are querying PostgreSQL Database using Amazon Athena Federated query which supports PostgreSQL as data source.

The Amazon Athena PostgreSQL connector enables Athena to access your PostgreSQL databases. Athena uses data source connectors that run on AWS Lambda to run federated queries. A data source connector is a piece of code that can translate between your target data source and Athena. You can think of a connector as an extension of Athena's query engine. With PostgreSQL as data source, the Lambda function data source connector is deployed and the Lambda function is connected using JDBC driver to the PostgreSQL data source .

When a query is submitted against a data source, Athena invokes the corresponding connector to identify parts of the tables that need to be read, manages parallelism, and pushes down filter predicates.

The Athena PostgreSQL connector performs predicate pushdown to decrease the data scanned by the query. LIMIT clauses, simple predicates, and complex expressions are pushed down to the connector to reduce the amount of data scanned and decrease query execution run time. However, selecting a subset of columns sometimes results in a longer query execution runtime.

[+] https://docs.aws.amazon.com/athena/latest/ug/connectors-postgresql.html#connectors-postgresql-performance

Now to answer your query, if PostgreSQL indexes are used when querying the DB from Amazon Athena. The Query running from the Amazon Athena on PostgreSQL data source would execute the statement on the PostgreSQL database as any SQL and hence, the native indexes would be used by PostgreSQL engine irrespective of from where the query is executed, whether it is from a JDBC client or from Amazon Athena Query editor.

I have tested this behavior in my lab wherein I have setup Athena with PostgreSQL data source and queried the Table with Where clause defined for the column which had index. I have configured auto_explain for the RDS PostgreSQL to log the explain plans of SQL queries executed on the DB and hence, I was able to verify the Explain plan generated by PostgreSQL for the SQL executed by Athena which confirmed that index scan was performed using the index available for the column queried in SQL.

You can perform similar test to verify the same at your end by enabling auto_explain in the shared_preload_libraries of PostgreSQL DB Parameter group and reviewing the Explain plan of the query that is executed from Amazon Athena. If you are using RDS as PostgreSQL DB, you can follow below document to log execution plans of SQL.

[+] https://repost.aws/knowledge-center/rds-postgresql-tune-query-performance

Hope the above information answers your concern. Thank you!

AWS
SUPPORT ENGINEER
answered 8 months ago
0

PostgreSQL supports native partitions. The Athena PostgreSQL connector can retrieve data from these partitions in parallel. If you want to query very large datasets with uniform partition distribution, native partitioning is highly recommended.

The Athena PostgreSQL connector performs predicate pushdown to decrease the data scanned by the query. LIMIT clauses, simple predicates, and complex expressions are pushed down to the connector to reduce the amount of data scanned and decrease query execution run time. However, selecting a subset of columns sometimes results in a longer query execution runtime.

Amazon Athena does not directly utilize existing indexes from your PostgreSQL database for query performance. It's a serverless query service that allows you to run SQL queries on data stored in Amazon S3 using the Presto query engine.

If you want to leverage indexes for improved query performance, you would typically need to create appropriate indexes directly on the data stored in Amazon S3. This could involve using partitioning and columnar storage formats like Parquet or ORC, which can significantly improve query performance in Athena.

Instead, Athena optimizes query performance by utilizing a technique known as predicate pushdown. This means that Athena's query engine will try to apply filters as early as possible in the query processing, reducing the amount of data that needs to be read from S3.

References

[1] https://docs.aws.amazon.com/athena/latest/ug/connectors-postgresql.html#connectors-postgresql-performance

[2] https://docs.aws.amazon.com/athena/latest/ug/athena-explain-statement.html#athena-explain-statement

[3] Third-party link - https://www.postgresqltutorial.com/postgresql-indexes/postgresql-create-index/

profile picture
EXPERT
answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions