How to create items in DynamoDB without throttling?

0

Hi,

I have a DynamoDB table with 6000 WCU. Using AWS Python SDK with 20 threads, I wrote 10,000 items, each 16KB, with UUIDs as partition key and sort key to ensure data is evenly distributed across partitions. Then, I deleted all the data using 20 threads. Although all operations were successful, the execution time was long, some deletions can be completed in 10ms, but some will be 6 or 13 seconds.

Upon checking DynamoDB metrics, I noticed a high number of WriteThrottleEvents, and only about 1000 WCUs were utilized. Is this due to uneven data distribution across partitions? Could anybody provide an example of writing 10,000 16KB items with 20 threads without encountering throttling (WCU can be set to any value, or it can be set to on-demand mode)?

(Batch Writes is not an option, and I tried, It also gets throttling.)

Thank you !

Ethan
asked 2 months ago165 views
2 Answers
2
Accepted Answer

My assumption here is that you read the items to delete by using a Scan operation. When you do this, items are returned using the same sort order as they are stored in DynamoDB. So now when you call DeleteItem in order, you are essentially causing a "Rolling hot key", wherein, all of your deletes are targetted to one partition, and slowly moving through all partitions one at a time in sequential order.

Here is my advice to obtain better performance:

  1. If you use Scan, either shuffle the data, or assign each one of your worker threads a different segment using Parallel Scan.
  2. Pre-warm your table, with the above in place, and also pre-warming your table you will have the ability to scale effortlessly as your items are all unique.
import boto3
import threading
from concurrent.futures import ThreadPoolExecutor

dynamodb = boto3.resource('dynamodb')
TABLE_NAME = 'MyTable'
PRIMARY_KEY = 'PK'
table = dynamodb.Table(TABLE_NAME)

def scan_and_delete(segment, total_segments):
    try:
        scan_kwargs = {
            'Segment': segment,
            'TotalSegments': total_segments,
            'ProjectionExpression': PRIMARY_KEY
        }

        while True:
            response = table.scan(**scan_kwargs)
            items = response.get('Items', [])
            
            for item in items:
                key = {k: item[k] for k in item if k in [PRIMARY_KEY]}
                table.delete_item(Key=key)
                print(f"Deleted item: {key}")

            if 'LastEvaluatedKey' in response:
                scan_kwargs['ExclusiveStartKey'] = response['LastEvaluatedKey']
            else:
                break
    except Exception as e:
        print(f"Error in segment {segment}: {e}")

NUM_THREADS = 20

with ThreadPoolExecutor(max_workers=NUM_THREADS) as executor:
    for i in range(NUM_THREADS):
        executor.submit(scan_and_delete, i, NUM_THREADS)
profile pictureAWS
EXPERT
answered 2 months ago
  • Hi Leeroy,

    Thank you very much !!! You pointed out the real cause of the performance bottleneck! I had considered parallel retrieval before but didn't quite understand how it worked. Thanks to your example, I'm no longer facing throttling issues. Thank you!

    Best Regards, Ethan

  • Thanks for the feedback Ethan, very happy to have helped :)

2

Hi,

If you can't use your full WCUs, you may well face the so-called "hot keys" issue: for all details, see

Part 1 Loading (with numbers and code) is probably the interesting one for you.

So, can you check if your data is structured properly to avoid such problems?

This blog post may also help you better understand hot keys: https://medium.com/@leeroy.hannigan/optimizing-dynamodb-queries-using-key-sharding-f3eb4d7f78f7

Best,

Didier

profile pictureAWS
EXPERT
answered 2 months ago
profile picture
EXPERT
reviewed 2 months ago
profile picture
EXPERT
reviewed 2 months ago
  • Hi Didier,

    Thank you!! Part 1 helped me better understand how to evenly distribute data across partitions.

    Best Regards, Ethan

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions