DynamoDB table purge 400M records from a 1Billion item table & batchwrite throttling

0

Example scenario - DynamoDB table has 1 billion rows company splits, we make a copy of the table for the other company and now we need to delete 400M of the 1B items. TTL nor deleting and recreating the table are options as we need to keep our part of the records. We looked at parallel scan reading the table and pushing rows to delete into a batchwrite operation to delete rows. Is there a better option to consider?

Side note initial tests with parallel scan and batchwrite are having throttling issues with single partition 1000wcu limit table has plenty of wcu capacity. Any other suggested approaches or thoughts to address single partition wcu issue?

table has customerid number PK ; LOBid string SK; and other columns <1KB / item.

AWS
Dave_G
asked 3 years ago994 views
1 Answer
1
Accepted Answer

With parallel scan, each thread scans a continuous segment in the table. If the items to be deleted are evenly distributed across the table space, you would encounter table level throttling instead of partition level throttling. The fact that you encountered partition level throttling without exceeding the provisioned WCU indicated that the items to be deleted might reside in a few partitions only. The result is, almost all items in a particular BatchWriteItem API call belong to the same partition, and you might be deleting from only a few partitions at a time.

One way to improve the performance would be perform a shuffling before doing the deletes. That is, the parallel scan threads pushes records to be deleted into a list. After the parallel scan finishes, perform a random ordering on items in the list. After that, the delete threads retrieve items from the list for deletion. With this approach, you increase the possibility that items in a BatchWriteItem API call are distributed in multiple partitions, taking advantage of the write capacity in multiple partitions.

AWS
answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions