Performance issue : Querying Athena using Boto3

0

I am querying Athena using Boto3 from python script. Most of the queries return more than 1000 records. As a result, I am using 'NextToken' in 'get_query_results' for fetching subsequent records. I observed each query takes significant long time (in minutes) to complete ( fetching 1171 records took more than a minute). Surprisingly, the data size is small - in KB. I am not sure if this behavior is a result of wrong usage on my part OR are there any underlying technical reasons. Any specific areas that I should look at? Appreciate any help with this.

asked 2 years ago1633 views
1 Answer
0

Hi,

Since I don't know your code, I would suggest that you check the following areas

  1. The authentication is only done once - Ensure that you are only doing the authentication once and the subsequent queries are made based on this initial authentication.
  2. Connected to the above issue, if you are using IAM roles, there is a limit on the number of calls you can make to get the instance metadata to get the access and secret key.
  3. Try putting logs at each points to understand exactly on which line the code takes time. If you have already done this and found out that query itself is taking time, then you will have to check exactly what and how you are querying it.

I'm not an expert but I hope this helps, if you have more information to add or if you have found out the solution by yourself, please provide a response so that we can have further discussion

Regards

Vignesh N

profile picture
answered 2 years ago
  • Hi Vignesh

    Thank you for your response. Providing some snippet of what my code does ( lines are copied from my python code).

    self.sess = boto3.Session (profile_name=profile, region_name=region) self.client = self.sess.client ('athena') execution_id = self.client.start_query_execution(QueryString = queryStr,QueryExecutionContext = { 'Database': '....'}, ResultConfiguration = { 'OutputLocation': 's3://...'}, WorkGroup='AmazonAthenaLakeFormation') while True: stats = self.client.get_query_execution(QueryExecutionId=execution_id['QueryExecutionId']) status = stats['QueryExecution']['Status']['State'] if status in ['SUCCEEDED', 'FAILED', 'CANCELLED']: break time.sleep (0.5) if status == 'SUCCEEDED': results = self.client.get_query_results(QueryExecutionId=execution_id['QueryExecutionId'], MaxResults=1000) """returns a python pandas frame and column names""" dataframe, columns = self.returnResultsAsPandaFrame(results) while 'NextToken' in results: """this has performance impact""" results = self.client.get_query_results(QueryExecutionId=execution_id['QueryExecutionId'], MaxResults=1000, NextToken=results['NextToken']) """this is not having a performance impact""" dataframe = self.appendData (dataframe, columns, results)

    I believe that Authentication is happening only once. (comments continued in next comment ...)

  • Does AWS Athena create execution logs for a query? Perhaps looking at the logs may provide some clues. Appreciate any guidance on this

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions