- 最新
- 最多得票
- 最多評論
One key way to identify if your table is facing issues due to hot partition is by the Throttling metrics, which in your case aren't there. In simple words, ReadTimeoutExceptions occur when the client did not get a response from the DAX cluster in the configured time window. This could be due to network issues(between client and DAX or DAX and the table), load on the DAX cluster. Its ideally recommended to have 3+ nodes in a DAX cluster (1 primary and rest as replicas)
In this case I believe what could be happening is, since there's a lot of write through traffic on the cluster, there's frequent invalidation of cache. This could probably lead to a lot of load on the primary node. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DAX.concepts.cluster.html#DAX.concepts.clusters
It may be worth investigating node level metrics on Cloudwatch for your DAX nodes to confirm if this is the case. Observing high CPU Util or memory pressure on one of the nodes, could be an indicator that the Primary node is getting overloaded.
Recommendations -
- Scale up the node type.
- If your workload is write-heavy and not so heavy on reads, you may want to re-valuate the cache strategy as caching helps in more read-heavy workloads.
When experiencing ReadTimeoutExceptions on DAX during high write activity, there are several important factors to consider about how DAX and DynamoDB interact with hot partitions.
DAX operates as a write-through cache, meaning that write operations (including TransactWriteItems and DeleteItem) are first written to DynamoDB and then to the DAX cache. This process can create bottlenecks that manifest differently than traditional DynamoDB throttling.
While DynamoDB has adaptive capacity to handle hot partitions (automatically increasing throughput capacity for partitions receiving more traffic), DAX has its own throughput limits at the node level. DAX will return a ThrottlingException when its capacity is exceeded, which is separate from DynamoDB's throttling mechanisms.
Your scenario of high volume writes to specific items (same primary keys) suggests a hot partition situation. Even though DynamoDB might be handling this adequately through its adaptive capacity, DAX could be experiencing bottlenecks due to:
- Cache invalidation overhead: When you perform frequent writes to the same items, DAX needs to constantly invalidate and update those cache entries
- Node capacity limitations: DAX has per-node throughput limits that differ from DynamoDB's partition-level limits
- Write-through processing: The additional processing required for write-through operations could be creating contention
The ReadTimeoutExceptions you're seeing likely indicate that DAX is struggling to keep up with the cache invalidation and update operations required by your concentrated write pattern, even though DynamoDB itself isn't being throttled.
To address this issue, consider:
- Increasing the size or number of nodes in your DAX cluster
- Reviewing your access patterns to better distribute writes across partition keys
- Monitoring DAX-specific metrics like ThrottledRequestCount in CloudWatch
- Evaluating whether your workload characteristics are well-suited for DAX, particularly if it's extremely write-heavy on specific items
Sources
DynamoDB burst and adaptive capacity - Amazon DynamoDB
1- Key range throughput exceeded (hot partitions) - Amazon DynamoDB
DAX: How it works - Amazon DynamoDB
相關內容
- 已提問 3 個月前
- 已提問 2 年前
