- Newest
- Most votes
- Most comments
Hi alexandrec,
If you query a partitioned table and do not specify the partition in a WHERE clause then Athena will scan the entire table. If you do specify a partition with a WHERE clause then Athena will scan only the data in the specified partition. This is mentioned in Amazon Athena documentation [1].
This would be why the first query scans the partitions correctly but the second one does not.
I hope this answers your question about the behavior you experience with this query. You may also benefit from [2] which outlines some Athena performance tuning tips.
[1] https://docs.aws.amazon.com/athena/latest/ug/partitions.html#partitions-considerations-limitations
[2] https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/
JOIN clause conditions should be understood by the underlying engine as filter predicates. No where in the documentation it is said anything about that specifically. Otherwise, Athena becomes a huge money sink for very large tables with thousands of partitions. The only solution is to reduce the amount of partitions, so we scan less data, but that comes at the cost of losing data in the table. Where can I get help from an AWS Athena engineer?
Hi alexandrec,
You can raise a technical support case at [1] to seek assistance from an AWS Athena engineer. Note that you will need a support plan that is Developer or above. See [2] for details on creating a support case with AWS.
[1] https://console.aws.amazon.com/support/home#/case/create?issueType=technical [2] https://docs.aws.amazon.com/awssupport/latest/user/case-management.html
Relevant content
- asked 5 months ago
- asked 7 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated a year ago
Did you try using a WHERE condition in the final select statement as a DUMMY statement. eg.
where 1=1
or
where fr.timestamp > 1671633935
Yes, dummy statements do not work.
@alexandrec, how did you end up working around this issue? One brute solution I have thought of is to have something run two queries, instead of one. IE. have something like a lambda or Step Functions Athena runner run the first query. Then have a second Lambda or Step Function step takes the results of the first query and feed them in as literals to the second query.