Does Athena support query caching?

0

Hi,
I am just trying to find if Athena supports caching like if the same query is executed again and again, will it fetch the data from cache instead of processing the storage and adding to the cost . I have checked the documentation and other forums but I could not find any instance where it says that 'Athena supports caching'

Please pardon me if I am missing something very obvious but refer me to the documentation links or some blog which answers this question.

Appreciate a response asap.

Thanks,
Ppeddi

ppeddi
질문됨 6년 전2100회 조회
3개 답변
0

No, Athena doesn't support query caching. I'm not sure it will ever as it's decoupling the compute (Athena) from storage (S3) so writes for instance are not going through Athena therefore managing the cache eviction wouldn't be possible from Athena alone. What you can do is implement this at the application level though. One thing for instance that I've done is using the Java SDK (instead of JDBC) you can get the execution ID of a query and then store the execution ID and refetch the results multiple times at a later point in time (cached fetch). You would just need to define in your application though when those results would become stale e.g. is it based on some elapsed time or if your application has visibility on all writes to S3 it could evict the cached result at that point.

If you look up caching in Presto there's some things like Qubole has added intermediate caching of the files that Presto worker nodes have to fetch from S3 e.g. - https://www.qubole.com/blog/caching-presto/ - this would be an internal optimization of Athena though and probably is not done, I'm just giving it as an example of what some have done with Presto (which Athena is built on).

Edited by: rruppmgp on Oct 16, 2018 1:56 PM

답변함 6년 전
0

If a user A runs query SELECT * FROM <sometable> where DATE>"2019-08-08"
then Athena will run that query and get the results from S3 and also an execution ID along with it.
Now, if the Application logic saves that execution ID in a cache along with results then will the execution ID be the same for user B who runs the same query the very next second ?

답변함 5년 전
0

For what I described you're basically just reading the results subsequent times by using that cached execution ID. This needs to be done via the API directly not via JDBC. The API operation is GetQueryResults, see here - http://docs.aws.amazon.com/goto/WebAPI/athena-2017-05-18/GetQueryResults

So if some query is submitted (StartQueryExecution - http://docs.aws.amazon.com/goto/WebAPI/athena-2017-05-18/StartQueryExecution) you get an executionID back e.g. "1234". If you cache the execution ID and have some logic then to know when to read from the cache (and when to invalid it) then you can just skip the "submit query" part and go directly to GetQueryResults with the execution ID "1234". How you implement the caching logic is up to you though, the main takeaway is that you can re-read results in Athena via the executionID (it's just reading the results from S3 that it has written to after the query finishes). This same mechanism is used by the Athena history page to be able to view old query results + their results. Athena does have some limit of I think 45 days that it will retain query history though so you wouldn't want to cache longer than that!

답변함 5년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인