- Newest
- Most votes
- Most comments
My assumption here is that you read the items to delete by using a Scan
operation. When you do this, items are returned using the same sort order as they are stored in DynamoDB. So now when you call DeleteItem in order, you are essentially causing a "Rolling hot key", wherein, all of your deletes are targetted to one partition, and slowly moving through all partitions one at a time in sequential order.
Here is my advice to obtain better performance:
- If you use Scan, either shuffle the data, or assign each one of your worker threads a different segment using Parallel Scan.
- Pre-warm your table, with the above in place, and also pre-warming your table you will have the ability to scale effortlessly as your items are all unique.
import boto3 import threading from concurrent.futures import ThreadPoolExecutor dynamodb = boto3.resource('dynamodb') TABLE_NAME = 'MyTable' PRIMARY_KEY = 'PK' table = dynamodb.Table(TABLE_NAME) def scan_and_delete(segment, total_segments): try: scan_kwargs = { 'Segment': segment, 'TotalSegments': total_segments, 'ProjectionExpression': PRIMARY_KEY } while True: response = table.scan(**scan_kwargs) items = response.get('Items', []) for item in items: key = {k: item[k] for k in item if k in [PRIMARY_KEY]} table.delete_item(Key=key) print(f"Deleted item: {key}") if 'LastEvaluatedKey' in response: scan_kwargs['ExclusiveStartKey'] = response['LastEvaluatedKey'] else: break except Exception as e: print(f"Error in segment {segment}: {e}") NUM_THREADS = 20 with ThreadPoolExecutor(max_workers=NUM_THREADS) as executor: for i in range(NUM_THREADS): executor.submit(scan_and_delete, i, NUM_THREADS)
Hi,
If you can't use your full WCUs, you may well face the so-called "hot keys" issue: for all details, see
- https://aws.amazon.com/blogs/database/part-1-scaling-dynamodb-how-partitions-hot-keys-and-split-for-heat-impact-performance/
- https://aws.amazon.com/blogs/database/part-2-scaling-dynamodb-how-partitions-hot-keys-and-split-for-heat-impact-performance/
Part 1 Loading (with numbers and code) is probably the interesting one for you.
So, can you check if your data is structured properly to avoid such problems?
This blog post may also help you better understand hot keys: https://medium.com/@leeroy.hannigan/optimizing-dynamodb-queries-using-key-sharding-f3eb4d7f78f7
Best,
Didier
Hi Didier,
Thank you!! Part 1 helped me better understand how to evenly distribute data across partitions.
Best Regards, Ethan
Relevant content
- Accepted Answerasked 3 months ago
- Accepted Answerasked 4 years ago
- asked a year ago
- asked 7 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 5 months ago
Hi Leeroy,
Thank you very much !!! You pointed out the real cause of the performance bottleneck! I had considered parallel retrieval before but didn't quite understand how it worked. Thanks to your example, I'm no longer facing throttling issues. Thank you!
Best Regards, Ethan
Thanks for the feedback Ethan, very happy to have helped :)