My Amazon Athena queries are taking a long time to run. The query queuing times are high. How can I speed up the query processing?
Athena is a serverless interactive query service. After you submit your queries to Athena, the queries are run on a pool of resources in the backend. These resources are shared by all users in the Region. Your queries might be temporarily queued before they run. Queries generally take a long time to run because of either a higher queuing time or a higher engine processing time.
Call the GetQueryExecution API for your query ID. This API returns information about a single processing of the query. The information includes useful details, such as the following:
- Time spent by the query in queuing (QueryQueueTimeInMillis)
- Time spent by the query in planning (QueryPlanningTimeInMillis)
- Engine processing time (EngineExecutionTimeInMillis)
This information can be viewed in the Statistics parameter of the API response.
Higher queueing time
Your queries might have a higher queuing time because of high resource usage in the backend. The queuing time in Athena is dependent on resource allocation. After you submit your queries to Athena, the queries are processed by assigning resources based on the following:
- Overall service load
- Number of incoming requests
If your queries have a higher queuing time, then do the following to improve query performance:
- Consider distributing your queries over a period of time. If you submit queries in batches, then submit small batches more frequently rather than submitting large batches less frequently. This might reduce the time a query stays in QUEUED state and improve the overall query processing time.
- Run a combination of simple and complex queries instead of running a set of complex queries at the same time. Also, consider submitting simple queries first followed by complex queries. Because simple queries are processed quickly, resources can be allocated to the complex queries without incurring a higher queuing time.
- If you are scheduling your queries in scenarios, such as generating periodic reports or loading new partitions, then avoid scheduling them at the start of the hour and 30 minutes past the hour timeframes. Most automated scripts and cron jobs are run during these timeframes. Therefore, the service load is usually higher during these periods, resulting in increased queuing times.
- If your use case permits, run your queries in multiple Regions. This can distribute the load and help acquire more backend resources. This approach can reduce the query queuing time.
Important: You might incur Amazon Simple Storage Service (Amazon S3) cross-Region charges.
Higher planning time
If your queries have a higher planning time, it might be caused by over-partitioning the table. Tables with hundreds or thousands of partitions can result in slower query processing. To improve query performance, try one or more of the following:
- Consider reducing the number of partitions.
- Query over one partition at a time and concatenate the results.
- Use partition projection to help speed up query processing of highly partitioned tables and automate partition management.
Higher processing time
If your queries have a higher engine processing time, then do the following to improve query performance:
- Partition your tables to restrict the amount of data scanned by each query. Partitions act as virtual columns and keeps related data together based on column values. Partitioning your tables can improve query performance and reduce costs. For more information, see Partitioning data.
- If the Amazon S3 file that you query is small (generally less than 128 MB), then the query processing time might be higher. The increase in time is due to the overhead involved in tasks, such as opening the S3 file, listing directories, and setting up data transfer. Use the S3DistCp utility on Amazon EMR to combine smaller S3 files into larger objects. Larger objects require fewer Amazon S3 requests, which reduces the query processing time.
- Perform other storage and query optimizations to improve performance and lower the engine processing times. For more information, see Top 10 performance tuning tips for Amazon Athena.
Note: You can submit several queries to Athena at the same time based on the default query-related quotas in your Region. Athena processes queries by assigning resources based on the overall service load and the number of incoming requests. Therefore, all your submitted queries might not run concurrently.
Performance tuning in Athena