Regarding the startup order of dynamodb streams and lambdas.

0

In a table configured with partition key and sort key, Insert multiple records with the same partition key

Lambda function is configured to process DynamoDB streams. And set ParallelizationFactor to 1.

At this time, is the lambda startup order always executed in series in the inserted order?

For example, first of all prepare the following table.

groupIdseqother attr
<Partition Key><Sort Key>

Next, add the following data in order.
① groupId = 1, seq = 1
② groupId = 1, seq = 2
③ groupId = 1, seq = 3
④ groupId = 1, seq = 4

Then, result as below.

groupId<PK>seq<SK>other attr
11
12
13
14

At this time, is the lambda startup order always executed in series in the added order? Is it possible to run in parallel?

Will Lambda always start in series if it has the same partition key(but sort key different) as below? Enter image description here

tamura
asked 2 years ago391 views
4 Answers
3
Accepted Answer

The other answers for this question seem to be slightly vague and do not allow you to fully understand the ordering of items in the stream. While some answers highlight key parts, they tend not to provide you with the big picture. Let me try to explain how streams order items:

First of all let's understand what a DynamoDB Partition is:

To write an item to the table, DynamoDB calculates the hash value of the partition key to determine which partition should contain the item. In that partition, several items could have the same partition key value. So DynamoDB stores the item among the others with the same partition key, in ascending order by sort key.

Next the relationship between partitions and shards:

Each partition on a DynamoDB maps 1:1 with a DynamoDB (active) Stream Shard. This means that items which share the same partition key will enter the same shard on the DynamoDB Stream (an important caveat coming later).

Finally the relationship between shards and Lambda consumers:

Each shard on your DynamoDB stream maps 1:1 with a Lambda container. This means that all items which enter a given shard will be processed by the same Lambda function, and they will maintain the same order in which they were written to the partition.

In summary of the above 3 points:

The relationship between partition:shard:lambda is 1:1:1. We now know that each item which is written to any given partition will enter a given shard in the order they were written and ultimately be consumed in Lambda maintaining the same ordering.

Now the caveat:

For the most part items which share the same partition key will share the same partition, this would mean that the ordering is maintained for an Item Collection (all items sharing the same partition key). However, it is not guaranteed that Item Collections will remain on the same partition, an Item Collection can be split across multiple partitions for many reasons:

  1. The throughput for that Item Collection exceeds the hard limit of 1000WCU or 3000RCU and Adaptive Capacity has split the collection to provide you more throughput.
  2. The Item Collection exceeded the 10GB storage size limit of a partition and the collection to provide you more storage.

It is for this reason that DynamoDB Streams guarantees ordering only at the item level, the item being an item which shares both the same partition and sort key.

DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table and stores this information in a log for up to 24 hours. Applications can access this log and view the data items as they appeared before and after they were modified, in near-real time.

A DynamoDB stream is an ordered flow of information about changes to items in a DynamoDB table. When you enable a stream on a table, DynamoDB captures information about every modification to data items in the table.


Direct Answer:

Next, add the following data in order.

① groupId = 1, seq = 1
② groupId = 1, seq = 2
③ groupId = 1, seq = 3
④ groupId = 1, seq = 4

As these items share a different sort key, then the ordering is not guaranteed as mentioned above. While its not guaranteed, for the most part it will be in order but you should be aware of the aforementioned caveat and ensure you do not take a dependency on the items being in order.

profile pictureAWS
EXPERT
answered 2 years ago
profile picture
EXPERT
reviewed 12 days ago
  • Great answer!. This is just answer that I wanted.

0

DynamoDB is built from partitions. Each partition has its own Shard in the DDB stream. Messages in the shard keep the order in which they were performed on the DB. If you enable parallel processing on the stream, messages for specific primary key will always be processed by the same function, so they will not be processed in parallel.

profile pictureAWS
EXPERT
Uri
answered 2 years ago
  • Thanks you, Uri.

    I updated the question.

    When registering multiple items in a table configured with partition key and sort key. If all items have the same partition key, will they be always processed not in parallel but in series?

  • @tamura no they will not be in order. See my answer.

0

DynamoDB stores data in partitions, which are based on either a partition key only or both a partition key and a sort key. When you enable a stream on a DynamoDB table, DynamoDB creates at least one shard per partition.

When a Lambda function is configured to process DynamoDB streams, one instance of the Lambda function is invoked per shard.

When you use DynamoDB to configure multiple Lambda functions with a stream to enable parallel processing of data, each function is concurrently invoked per shard.

This is a good blog which provides more granular details that you can refer https://aws.amazon.com/blogs/database/how-to-perform-ordered-data-replication-between-applications-by-using-amazon-dynamodb-streams/

profile pictureAWS
EXPERT
answered 2 years ago
profile pictureAWS
EXPERT
reviewed 2 years ago
  • Thanks you, AWS-User-Nitin.

    I think the following are important. 「DynamoDB creates at least one shard per partition.」

    May one partition key create multiple shards?

    Then, for example, in the following primary key configuration table.

    ・groupId = partition key ・seq = sort key

    Also, Lambda function is configured to process DynamoDB streams and set the "ParallelizationFactor" to 1.

    Assume that the following data are registered in order.

    ① groupId = 1, seq = 1 ② groupId = 1, seq = 2 ③ groupId = 1, seq = 3 ④ groupId = 1, seq = 4

    At this time, will the lambda always start in serial in the event order of 1 → 2 → 3 → 4? Or do they sometimes start in parallel?

0

Hi Based on documentation a shard present a partition and the other of changes is warrany

But based on my knowledge, the order of changes for one item is warantied and not order of multiple items ,

Your solution to that is to ensure the order on your design so instead of relying on lambda behind streams use a fan out with for example sqs sns and use a validation, so in your example the sk 4 must not be processed if 3 is not yet treated

Any way use that in your side instead of looking infrastructure ( dynamodb stream) to make business needs

Your design can be something like

Streams > lambda > sqs > lambda , dynamodb

If every thing is OK you just check last sk treated in your secend dynamodb and you will treat the n +1

If treating n+1 not possible as n is not treated yet send that message to a deffered sqs with delay in seconds , the differed sqs also trigger the same lambda but after some wait time this let you the item n get treated in a meantime

Wish that helps ,

answered 2 years ago
profile picture
EXPERT
reviewed 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions