使用 AWS re:Post 即表示您同意 AWS re:Post 使用條款

Managing State in Lambda Functions Processing Kinesis Data Streams

0

Hello everybody,

I am developing an application that processes messages from an Amazon Kinesis Data Stream using AWS Lambda functions. My current approach assumes that the same instance of a Lambda function will handle all messages from a single shard, allowing me to store intermediate data within the function’s temporary storage (e.g., global variables) for the duration of the instance's life.

I am reaching out to confirm whether this assumption is accurate and if this approach to state management within a Lambda function is advisable.

Specifically, I am concerned about the following:

  1. Instance Reuse Across Shard Messages: Can I reliably expect that the same Lambda function instance will process all messages from a particular shard consecutively, allowing for effective use of in-memory storage for state management? Is there a possibility that during the processing of messages from the same shard, one instance may be replaced by another, potentially disrupting the continuity of state? If so, how might this impact the effectiveness of using in-memory storage for state within a single processing sequence?

  2. Implications of Scaling: How does the scaling behavior of Lambda functions, in response to fluctuations in the volume of incoming messages, affect the continuity of processing messages from the same shard? Specifically, I am interested in understanding how Lambda's scaling could impact state management within my application.

To illustrate my concern: If I am processing a sequence of related messages from a single shard, and relying on the assumption that they will be handled by the same instance of my Lambda function, what risks or limitations should I be aware of?

Thank you! Best, Mikhail

已提問 8 個月前檢視次數 2130 次
1 個回答
0

Your assumption is incorrect. There is no affinity between a shard and the specific Lambda instance that handles that shard. As to scaling, there will be between 1-10 instances running concurrently per each shard, depending on the configuration of the Parallelization Factor parameter in the event source mapping.

To achieve what you want you have two options:

  1. When using a Time Window you can return from an invocation a state object that you get back in the next invocation in the same shard. Note that the object is initialized every time a new window starts, so this can't be used if the data you need to processes takes longer the the window.
  2. Maintain the state outside the function, e.g., a DynamoDB table, and read it in each invocation and update it at the end of the invocation.
profile pictureAWS
專家
已回答 8 個月前
profile pictureAWS
專家
已審閱 8 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南