- Newest
- Most votes
- Most comments
Option 1 is by far a better option, as you will have access distributed over the entire key-space. Setting the partition key of your table to a static variable is an anti-pattern and should be avoided at all costs if you want a scalable solution.
- Will I be able to write more than 1000 WRU to my table, or will I be throttled as all data is placed into a single partition?
You will be throttled as all data is written to a single partition.
- If I am throttled, what is the expected delay until the partition is split?
You cannot assume the partition will ever be split, you should not design a schema which relies on DynamoDB's adaptive capacity. For example, if you use a timestamp for your SK which is ever increasing/decreasing then it would not make sense to split the partition, therefore DynamoDB will not and you are capped at 1000WCU.
- Will the table scale in number of partitions when I exceed 50% of the initial WRU (4000).
Yes, if you exceed 50% of your previous high peak for several minutes (~30mins) DynamoDB will split to provide more throughput.
- Are there any other concerns with option 2 over option 1? Note: using both pk and sk and setting pk to multiple values is not an option being evaluated.
Option 2 is a concern as you are expecting a feature of DynamoDB to split partitions, something which you have no control over, and can impact your throughput.
Relevant content
- asked 8 months ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 5 months ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 2 years ago
Why should I not design a schema with adaptive capacity in mind if it is an advertised feature and the documentation advertises that item collections can be split into multiple partitions? Can you point to documentation stating that table design should not rely on adaptive capacity?
I'm trying to understand the failure modes of option 2 and weighing them against the benefit of doing range queries to reduce overall traffic on the table.
Your Q didn't mention range queries. You presented 2 options, implying your access patterns could be satisfied with either one, and between those choices Opt 1 is simpler and more certain to work. If you need range queries then it maybe you'll want to use a sort key. Maybe with sharded PK values.
The downside of relying on adaptive capacity item isolation is the rules for when it applies are unwritten and subject to change, plus even in the best case it takes time under pressure to kick in. Imagine if you provision a large amount of capacity (thus lots of partitions), if you have just one PK value you're still going to start with everything going to one partition.