DynamoDB how does partitioning work when sort key begins to be included

1

My question about DynamoDB is about this: https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/

Where it says: “DynamoDB splits partitions by sort key if the collection size grows bigger than 10 GB.”

Does that mean it will use the sort key for partitioning? As in PK+SK will be used to calculate the hash? Or just that it will be sorted by SK, so once a partition fills, the next partition will pick up where the other one left off? But it’s still only the PK used for calculating the hash

I’m wondering if the cardinality of my partition key is determined by PK or PK+SK. If it’s the latter, then an operation like “Delete all items where PK=X”, would not have as big of a hot key / hot partition concern, because the SK should distribute the operation across many partitions. But if all items with the same PK are bundled up together in one partition (or a couple) then it will definitely be a problem.

AWS
preguntada hace 2 años885 visualizaciones
1 Respuesta
4
Respuesta aceptada

The partition key is hashed to determine in which partition the item will go. Within that partition the items in the item collection for each partition key are held in sort key order. If the item collection grows large enough, it might be split across multiple partitions, in which case items there will be a split point chosen in the sort key and the items with an SK ahead of that go into one partition and the items with an SK after that go into another one.

Analogy time. Think of DynamoDB like a set of phonebooks. The PK is like a city name and its value determines which phone book to use. The SK would be like the names in each book, held in linear order. You want good dispersion of the PKs so you can have lots of phone books, and lots of parallel processing possible. For some cities like NYC it's big enough you split the names across a few books, like A-M and N-Z.

To be a bit more detailed, the hash of the city name determines the shelf on which the phone book for that city can be found. So hash the PK, find the shelf, find the right book on the shelf, use the sorted values within the book. It's all very efficient. Some cities can take up a whole shelf or more than one shelf.

Some individual names can be so commonly read/written that to spread visitor traffic they get their own shelf just for their page.

AWS
respondido hace 2 años
profile picture
EXPERTO
revisado hace un mes

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas