Hi all,
I run some AWS Athena queries with the ResultReuseByAgeConfiguration
option set. From a behaviour perspective, I would expect Athena to cache the results, but it seems Athena is caching it too well. I am using parameterised queries, and it seems that Athena is not taking the parameterised values in consideration when deciding what needs to be cached. Is this expected behaviour, or possibly a bug in how Athena manages caching?
To demonstrate, I create a very simple stored procedure in Athena.
PREPARE TEST_CACHE FROM
SELECT ?
Running the following code, you would expect all tests to pass (value inserted and returned are the same)
import boto3
import time
def query(v,reuse=False):
athena = boto3.client('athena')
query_id1 = athena.start_query_execution(
QueryString = 'execute TEST_CACHE',
ExecutionParameters = [v],
QueryExecutionContext = { 'Database' : 'default' },
WorkGroup = 'ca-main-athena',
ResultReuseConfiguration={
'ResultReuseByAgeConfiguration': {
'Enabled': reuse,
'MaxAgeInMinutes': 60
}
}
)['QueryExecutionId']
while True:
finish_state = athena.get_query_execution(QueryExecutionId=query_id1)["QueryExecution"]["Status"]["State"]
if finish_state == "RUNNING" or finish_state == "QUEUED":
time.sleep(1)
else:
break
athenaresult1 = athena.get_query_results(QueryExecutionId=query_id1)
r = athenaresult1['ResultSet']['Rows'][1]['Data'][0]['VarCharValue']
if r == v:
print(f"result match - expected {v} and got {r}")
else:
print(f"result does not match - expected {v} and got {r}")
# = works as expected - result reuse is turned off
query("alpha",False)
query("bravo",False)
query("charlie",False)
# = does not work as expected - result reuse is turned on
query("delta",True)
query("echo",True)
query("foxtrot",True)
The result does demonstrate that the caching is not correct.
$ python test_athena.py
result match - expected alpha and got alpha
result match - expected bravo and got bravo
result match - expected charlie and got charlie
result does not match - expected delta and got charlie
result does not match - expected echo and got charlie
result does not match - expected foxtrot and got charlie
We have resolved this issue. You should be able to see the intended behaviour for parameterized queries with caching now.