DynamoDB how does partitioning work when sort key begins to be included

1

My question about DynamoDB is about this: https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/

Where it says: “DynamoDB splits partitions by sort key if the collection size grows bigger than 10 GB.”

Does that mean it will use the sort key for partitioning? As in PK+SK will be used to calculate the hash? Or just that it will be sorted by SK, so once a partition fills, the next partition will pick up where the other one left off? But it’s still only the PK used for calculating the hash

I’m wondering if the cardinality of my partition key is determined by PK or PK+SK. If it’s the latter, then an operation like “Delete all items where PK=X”, would not have as big of a hot key / hot partition concern, because the SK should distribute the operation across many partitions. But if all items with the same PK are bundled up together in one partition (or a couple) then it will definitely be a problem.

AWS
asked 2 years ago870 views
1 Answer
4
Accepted Answer

The partition key is hashed to determine in which partition the item will go. Within that partition the items in the item collection for each partition key are held in sort key order. If the item collection grows large enough, it might be split across multiple partitions, in which case items there will be a split point chosen in the sort key and the items with an SK ahead of that go into one partition and the items with an SK after that go into another one.

Analogy time. Think of DynamoDB like a set of phonebooks. The PK is like a city name and its value determines which phone book to use. The SK would be like the names in each book, held in linear order. You want good dispersion of the PKs so you can have lots of phone books, and lots of parallel processing possible. For some cities like NYC it's big enough you split the names across a few books, like A-M and N-Z.

To be a bit more detailed, the hash of the city name determines the shelf on which the phone book for that city can be found. So hash the PK, find the shelf, find the right book on the shelf, use the sorted values within the book. It's all very efficient. Some cities can take up a whole shelf or more than one shelf.

Some individual names can be so commonly read/written that to spread visitor traffic they get their own shelf just for their page.

AWS
answered 2 years ago
profile picture
EXPERT
reviewed 24 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions