Managing State in Lambda Functions Processing Kinesis Data Streams

0

Hello everybody,

I am developing an application that processes messages from an Amazon Kinesis Data Stream using AWS Lambda functions. My current approach assumes that the same instance of a Lambda function will handle all messages from a single shard, allowing me to store intermediate data within the function’s temporary storage (e.g., global variables) for the duration of the instance's life.

I am reaching out to confirm whether this assumption is accurate and if this approach to state management within a Lambda function is advisable.

Specifically, I am concerned about the following:

  1. Instance Reuse Across Shard Messages: Can I reliably expect that the same Lambda function instance will process all messages from a particular shard consecutively, allowing for effective use of in-memory storage for state management? Is there a possibility that during the processing of messages from the same shard, one instance may be replaced by another, potentially disrupting the continuity of state? If so, how might this impact the effectiveness of using in-memory storage for state within a single processing sequence?

  2. Implications of Scaling: How does the scaling behavior of Lambda functions, in response to fluctuations in the volume of incoming messages, affect the continuity of processing messages from the same shard? Specifically, I am interested in understanding how Lambda's scaling could impact state management within my application.

To illustrate my concern: If I am processing a sequence of related messages from a single shard, and relying on the assumption that they will be handled by the same instance of my Lambda function, what risks or limitations should I be aware of?

Thank you! Best, Mikhail

Mikhail
질문됨 한 달 전776회 조회
1개 답변
0

Your assumption is incorrect. There is no affinity between a shard and the specific Lambda instance that handles that shard. As to scaling, there will be between 1-10 instances running concurrently per each shard, depending on the configuration of the Parallelization Factor parameter in the event source mapping.

To achieve what you want you have two options:

  1. When using a Time Window you can return from an invocation a state object that you get back in the next invocation in the same shard. Note that the object is initialized every time a new window starts, so this can't be used if the data you need to processes takes longer the the window.
  2. Maintain the state outside the function, e.g., a DynamoDB table, and read it in each invocation and update it at the end of the invocation.
profile pictureAWS
전문가
Uri
답변함 한 달 전
profile pictureAWS
전문가
검토됨 한 달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인