Please help interpreting error message: Fail number 1/5 | Exception: An error occurred (ThrottlingException) when calling the GetWorkGroup operation (reached max retries: 5): Rate exceeded


I have a parallelized python script running in containers which transforms data and writes it to S3 with updates to the Glue catalog. Each container runs several tasks in parallel and the overall data processing task is scaled horizontally by running several containers at once. In total at any time there are > 50 independent tasks running.

As I scale the number of independent tasks I see this error in the logs on occasion: ERROR:awswrangler._utils:Retrying <bound method ClientCreator._create_api_method.<locals>._api_call of <botocore.client.Athena object at 0x7fc04041f200>> | Fail number 1/5 | Exception: An error occurred (ThrottlingException) when calling the GetWorkGroup operation (reached max retries: 5): Rate exceeded

The code I believe this is in response to is: wr.s3.to_parquet(df=ac1_dataset_writeable, path=f"s3://{bucket_name}/{ac1_prefix}/", dataset=True, mode=parquetwritemode, database="f-options-ac1", table="ac1", partition_cols=["ac1_symbol"])

I understand this is caused by attempted simultaneous writes to the catalog. What I am struggling with is that the error message itself is somewhat confusing, specifically as it says:

  • Retrying
  • Fail number 1/5
  • reached max retries: 5

all in the same error message. It is not evident to me if the function will try again to write my data (Retrying), or if it is now not going to try as it has exhausted the available retries (reached max retries: 5) ... If anyone has experience in how to properly interpret this message I would be grateful for the benefit of your experience.

asked 2 months ago144 views
2 Answers

The error you're seeing indicates that you may be exceeding the throttling limits for one of the AWS services, in this case Amazon Athena. Athena has quotas on the number of concurrent queries and amount of data scanned per second to prevent any single user from dominating shared resources.

A few things you could try:

  • Add delays or backoff between Athena queries in your tasks to smooth out the request rate over time.
  • Check your IAM roles and policies to ensure the correct service quotas and limits are in place.
  • Consider using Athena concurrency scaling to automatically increase the number of workgroups as needed.
  • For very high throughput needs, look into AWS Glue ETL jobs which may have higher limits than Athena alone.
  • The key is to distribute the load over time rather than having all tasks attempt Athena operations at the exact same time. Some throttling is expected with serverless services, so adding retries and backoff is a best practice. Let me know if any of those suggestions help or if you have additional questions!
profile pictureAWS
answered 2 months ago
profile picture
reviewed 2 months ago

That is helpful but I still am left uncertain if the wrangler function will retry again or if this means it has failed and will not retry?

answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions