- Newest
- Most votes
- Most comments
Based on the information provided, it appears that the persistent throttling issues you're experiencing with your Kinesis Data Stream (KDS) are likely due to hitting the GetRecords API limit rather than data throughput limits. Here's an analysis of the situation and some recommendations:
- Root Cause of Throttling: The consistent "Request Exceeded Threshold" metric at 40-50%, regardless of shard count, strongly suggests that you're hitting the GetRecords API limit. Each shard in Kinesis Data Streams supports up to 5 GetRecords calls per second. With 96 shards and 32 consumers, you may be making too many GetRecords calls too frequently.
The fact that throttling occurs even during periods with no incoming data further supports this conclusion. This behavior indicates that your consumers are continuously polling the stream, even when there's no new data to process.
- Recommendations for Optimal Consumer Configuration:
a. Adjust GetRecords Polling Frequency: Instead of waiting for 1 second on throttling, implement a more adaptive approach. Use exponential backoff with jitter when encountering throttling errors. Start with a small delay (e.g., 100ms) and increase it exponentially up to a maximum (e.g., 1 second) if throttling persists.
b. Optimize Batch Size: Your current batch size of 1,500 records is appropriate. However, consider dynamically adjusting this based on the actual data volume. During low-traffic periods, you could reduce the batch size to minimize unnecessary API calls.
c. Implement Adaptive Reads: Enable adaptive reads in your consumer application. This feature automatically adjusts the frequency of GetRecords calls based on the current data volume and processing speed, helping to reduce unnecessary API calls during quiet periods.
d. Consider Using Enhanced Fan-Out: If low latency is crucial for your use case, consider using Enhanced Fan-Out consumers. This feature provides dedicated throughput per consumer, allowing up to 2 MB/second of read throughput per shard without counting against the standard shard limits.
e. Optimize Number of Consumers: Review your consumer count (currently 32) in relation to your shard count (96). Ensure that you have an appropriate balance to avoid over-polling individual shards.
f. Implement Proper Error Handling: Instead of a fixed 1-second wait on throttling, implement a more sophisticated error handling mechanism that adapts to the current throttling rate and backpressure.
By implementing these recommendations, you should be able to reduce the throttling issues while maintaining the low latency requirements of your application. Monitor the "ReadProvisionedThroughputExceeded" metric in CloudWatch after making these changes to verify improvement. If issues persist, you may need to further fine-tune your consumer application or consider increasing the number of shards if your data volume genuinely requires it.
Sources
Handle startup, shutdown, and throttling - Amazon Kinesis Data Streams
Upstream or source throttling from a Kinesis data stream - Managed Service for Apache Flink
Troubleshoot Kinesis Data Streams issues in DynamoDB | AWS re:Post
Relevant content
- asked 2 years ago
- asked a year ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 6 months ago