DynamoDb stream batch size

0

I setup dynamodb stream with lambda as trigger. Lambda function will receive event type as insert and will spin up ec2 for migration. I set up batch size as 4 and batch window also 4 for testing purpose. If I understand correctly, for example, when adding 3 records into table, I open 3 windows to simulate that 3 records are added at the same time. lambda will not poll the records and run function since batch size is 4 or window time is not expire. After 4 seconds, lambda will pick the records and trigger function once to handle 3 records, this mean that only 1 server will be provisioned. But I do see 3 seperate servers. In cloudwatch log, I also see that function is triggered 3 times. Is my understand about batch size and window correct or not ?

asked 14 days ago50 views
3 Answers
4

Hoping to add more context about this, in addition to jzhunter's answer.

Say you have a DynamoDB table with 3 partitions, and all the 3 items you wrote simultaneously land on 3 different partitions. DynamoDB Streams creates shards based on the number of partitions on the table, so there will be 3 active shards, corresponding to the 3 partitions. Stream records are created in shards to it's respective DynamoDB table partitions.

Lambda pollers are per shard for DynamoDB streams, thus in this case when there are 3 items written simultaneously but on different partitions, there will be 3 separate Lambda invocations as they have separate pollers.

There would have been only 1 Lambda invocation if:

  1. All the items landed on the one DynamoDB partition
  2. Stream records were created "within" 4 seconds as per the batching window
profile pictureAWS
EXPERT
answered 13 days ago
profile picture
EXPERT
reviewed 13 days ago
  • Thanks for your response. This mean that lambda invocation happen only 1 if I update on the same partition key, right ? Because my table use record id as partition key, it also act as primary key, too. Since primary key cannot be duplicated , that is the reason why my function invoke separately.

3
Accepted Answer

To understand this behaviour you have to first understand the mapping of partitions to Lambda instances. Each partition in DynamoDB maps 1:1:1 with a shard in the stream and a Lambda instacne:

mapping

If the 3 items you write to DynamoDB have the same partition key, then you might* see the behaviour you expect. However, if they share different partition keys, the will most likely end up in different partitions, then different shards, and ultimately invoke 1 to many different Lambda instances.

DynamoDB streams only guarantee that the same item will end up in the same Lambda instance, meaning if you update the same item 3 times, you are guaranteed to see the behaviour you expect, otherwise there are no guarantees.

*DynamoDB does not guarantee items with the same partition key will be in the same physical partition. It only offers guarantees for item level (primary key, not partition key only).

profile pictureAWS
EXPERT
answered 13 days ago
profile picture
EXPERT
reviewed 13 days ago
  • Thanks for the image that describe the behavior of dynamodb stream and lambda. My question is match exactly with this image. I use record id as partition id, mean that they will be placed on different partition, leading to different shard.

  • You're welcome. Pictures say a thousand words.

0

DynamoDB sends stream records out using shards (in order to scale), and the settings you're configuring are per shard not per table. If you insert three items and they're landing in three separate shards, then you'd get the behavior you're describing.

Search for "shard" in the docs: https://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html

AWS
answered 13 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions