How do I determine throttling in my CloudWatch logs?
5 minute read
I receive a "RequestLimitExceeded" or "ThrottlingException" error when working with Amazon CloudWatch logs, and my API call is throttled.
When you work with CloudWatch logs, you might exceed the API rate limit. When this happens, you receive a RequestLimitExceeded or ThrottlingException error, and your API call is throttled. You must identify where and when throttling is happening so you can resolve these errors and make informed rate limit increase requests.
Intermittent throttling on CloudWatch logs when accessing logs
You can use the FilterLogEvents or GetLogEvents API calls to list your log events or log streams. These API calls have a hard limit, and they don't qualify for a limit increase. This means that if you use the FilterLogEvents API to search for log events from a specified log group, the default quaAPI has a default quota. This quote is 5 transactions per second (TPS) per account or AWS Region. If you reach this limit, then you receive the RateExceeded error.
Use these best practices to avoid throttling errors in this use case:
Use CloudWatch Logs Insights to quickly get log data from CloudWatch logs. Use queries to filter your logs to view specific log groups.
Export log data to Amazon Simple Storage Service (Amazon S3) for batch use cases. Note: Log data can take up to 12 hours to become available for export from CloudWatch Logs. Therefore, it's not a best practice to use this method for real-time analysis and processing.
ThrottlingException errors when using an application/script to fetch CloudWatch log data
To collect CloudWatch logs, you can develop a collector script. This script attempts a DescribeLogStream or GetLogEvents API call to pull data from different log streams or different time frames in the same log group. API calls such as FilterLogEvents, GetLogEvents and DescribeLogStreams are designed for human interaction and not for automation. This means that you receive an error and the API call is throttled.
Use these best practices to avoid throttling in this use case:
Distribute your API calls over time. Schedule actions with some randomization so that they are spread over a period of time.
Add sleep intervals between consecutive API calls. Add some delay between API calls that are sent from the same script or application. If you send API calls in rapid succession, then this is more likely to cause rate errors.
In some cases, you might use a SIEM solution such as Splunk to fetch logs from CloudWatch. SIEM solutions are used to gather data from multiple systems and analyze this data to detect unusual behavior. You might experience API throttling when you use the Splunk plugin. In order to avoid this issue, create a CloudWatch logs subscription filter with Amazon Kinesis Data Firehose and deliver the log data to Splunk. For more information, see the Splunk documentation for Configure Kinesis inputs for the Splunk Add-on for AWS.
Throttling errors when integrating PutLogEvents API calls with Lambda function
The PutLogEvents API call is used to upload logs to a specified log stream in batches of 1 MB. This API has a rate limit of 800 TPS, per account, per Region. This applies except for the following Regions where the quota is 1500 TPS per account per Region: US East (N. Virginia), US West (Oregon), and Europe (Ireland). You can request a quota increase.
AWS defines quotas for services to protect performance and to be sure of availability. CloudWatch has quotas for metrics, alarms, API request, and alarm email notifications. Use these steps to visualize your service quotas and set alarms if you reach the threshold: