- Newest
- Most votes
- Most comments
The performance difference you're experiencing is likely due to Athena still needing to interact with the AWS Glue Data Catalog during the query planning phase, even when partition projection is enabled. While partition projection allows Athena to calculate partition locations without retrieving partition metadata, the table itself and its properties are still stored in and retrieved from the Glue Catalog.
When you have a table with a large number of pre-existing partitions registered in the Glue Catalog (Table A), there may be additional overhead during the initial catalog lookup and table metadata retrieval, even though Athena won't use those specific partition entries for the query. The catalog still contains all that legacy partition metadata, which could impact the speed of catalog operations.
In contrast, Table B was created fresh with partition projection from the start and never accumulated partition metadata in the catalog, resulting in a cleaner, more efficient catalog lookup.
To potentially improve Table A's performance, you could consider:
-
Dropping the legacy partition metadata from the Glue Catalog while keeping partition projection enabled. Since partition projection calculates partitions dynamically, you don't need the old partition entries.
-
Recreating the table with partition projection enabled from the start (similar to Table B), which would give you a clean slate without legacy metadata.
-
Verifying your partition projection configuration ensures you're including predicates on all partition keys in your queries and that the ranges specified are reasonable, as unnecessarily broad ranges can cause performance issues.
The key insight is that while partition projection bypasses partition metadata retrieval for query execution, the overall table metadata and catalog interaction can still be affected by the presence of large amounts of legacy partition data in the Glue Catalog.
Sources
Optimize data - Amazon Athena
Optimize queries with AWS Glue partition indexing and filtering - Amazon Athena
Update: I found an additional factor. The two tables had different projection.year.range settings:
- Fast table (impressions_pp_test): 'projection.year.range'='2026,2027' (2 years)
- Slow table (impressions_pp): 'projection.year.range'='2024,2060' (37 years)
After narrowing the slow table's range to '2024,2027', planning time dropped from ~76 seconds to ~9 seconds — even though the query already had WHERE year = 2026.
This suggests that partition projection computes the full projection space from the range configuration before applying WHERE clause pruning. The wider the range, the more combinations are generated upfront (year × month × day × hour × minute), which significantly impacts planning time for minute-level partitioned tables.
I couldn't find any documentation mentioning this behavior. Is this expected? Are there any best practices for setting projection ranges on tables with high-granularity partitions?
Relevant content
- AWS OFFICIALUpdated 2 years ago
