Does Athena support query caching?

0

Hi,
I am just trying to find if Athena supports caching like if the same query is executed again and again, will it fetch the data from cache instead of processing the storage and adding to the cost . I have checked the documentation and other forums but I could not find any instance where it says that 'Athena supports caching'

Please pardon me if I am missing something very obvious but refer me to the documentation links or some blog which answers this question.

Appreciate a response asap.

Thanks,
Ppeddi

ppeddi
asked 6 years ago2082 views
3 Answers
0

No, Athena doesn't support query caching. I'm not sure it will ever as it's decoupling the compute (Athena) from storage (S3) so writes for instance are not going through Athena therefore managing the cache eviction wouldn't be possible from Athena alone. What you can do is implement this at the application level though. One thing for instance that I've done is using the Java SDK (instead of JDBC) you can get the execution ID of a query and then store the execution ID and refetch the results multiple times at a later point in time (cached fetch). You would just need to define in your application though when those results would become stale e.g. is it based on some elapsed time or if your application has visibility on all writes to S3 it could evict the cached result at that point.

If you look up caching in Presto there's some things like Qubole has added intermediate caching of the files that Presto worker nodes have to fetch from S3 e.g. - https://www.qubole.com/blog/caching-presto/ - this would be an internal optimization of Athena though and probably is not done, I'm just giving it as an example of what some have done with Presto (which Athena is built on).

Edited by: rruppmgp on Oct 16, 2018 1:56 PM

answered 6 years ago
0

If a user A runs query SELECT * FROM <sometable> where DATE>"2019-08-08"
then Athena will run that query and get the results from S3 and also an execution ID along with it.
Now, if the Application logic saves that execution ID in a cache along with results then will the execution ID be the same for user B who runs the same query the very next second ?

answered 5 years ago
0

For what I described you're basically just reading the results subsequent times by using that cached execution ID. This needs to be done via the API directly not via JDBC. The API operation is GetQueryResults, see here - http://docs.aws.amazon.com/goto/WebAPI/athena-2017-05-18/GetQueryResults

So if some query is submitted (StartQueryExecution - http://docs.aws.amazon.com/goto/WebAPI/athena-2017-05-18/StartQueryExecution) you get an executionID back e.g. "1234". If you cache the execution ID and have some logic then to know when to read from the cache (and when to invalid it) then you can just skip the "submit query" part and go directly to GetQueryResults with the execution ID "1234". How you implement the caching logic is up to you though, the main takeaway is that you can re-read results in Athena via the executionID (it's just reading the results from S3 that it has written to after the query finishes). This same mechanism is used by the Athena history page to be able to view old query results + their results. Athena does have some limit of I think 45 days that it will retain query history though so you wouldn't want to cache longer than that!

answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions