Performance issue : Querying Athena using Boto3

0

I am querying Athena using Boto3 from python script. Most of the queries return more than 1000 records. As a result, I am using 'NextToken' in 'get_query_results' for fetching subsequent records. I observed each query takes significant long time (in minutes) to complete ( fetching 1171 records took more than a minute). Surprisingly, the data size is small - in KB. I am not sure if this behavior is a result of wrong usage on my part OR are there any underlying technical reasons. Any specific areas that I should look at? Appreciate any help with this.

질문됨 2년 전1657회 조회
1개 답변
0

Hi,

Since I don't know your code, I would suggest that you check the following areas

  1. The authentication is only done once - Ensure that you are only doing the authentication once and the subsequent queries are made based on this initial authentication.
  2. Connected to the above issue, if you are using IAM roles, there is a limit on the number of calls you can make to get the instance metadata to get the access and secret key.
  3. Try putting logs at each points to understand exactly on which line the code takes time. If you have already done this and found out that query itself is taking time, then you will have to check exactly what and how you are querying it.

I'm not an expert but I hope this helps, if you have more information to add or if you have found out the solution by yourself, please provide a response so that we can have further discussion

Regards

Vignesh N

profile picture
답변함 2년 전
  • Hi Vignesh

    Thank you for your response. Providing some snippet of what my code does ( lines are copied from my python code).

    self.sess = boto3.Session (profile_name=profile, region_name=region) self.client = self.sess.client ('athena') execution_id = self.client.start_query_execution(QueryString = queryStr,QueryExecutionContext = { 'Database': '....'}, ResultConfiguration = { 'OutputLocation': 's3://...'}, WorkGroup='AmazonAthenaLakeFormation') while True: stats = self.client.get_query_execution(QueryExecutionId=execution_id['QueryExecutionId']) status = stats['QueryExecution']['Status']['State'] if status in ['SUCCEEDED', 'FAILED', 'CANCELLED']: break time.sleep (0.5) if status == 'SUCCEEDED': results = self.client.get_query_results(QueryExecutionId=execution_id['QueryExecutionId'], MaxResults=1000) """returns a python pandas frame and column names""" dataframe, columns = self.returnResultsAsPandaFrame(results) while 'NextToken' in results: """this has performance impact""" results = self.client.get_query_results(QueryExecutionId=execution_id['QueryExecutionId'], MaxResults=1000, NextToken=results['NextToken']) """this is not having a performance impact""" dataframe = self.appendData (dataframe, columns, results)

    I believe that Authentication is happening only once. (comments continued in next comment ...)

  • Does AWS Athena create execution logs for a query? Perhaps looking at the logs may provide some clues. Appreciate any guidance on this

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인