Understanding Kinesis sequence numbers

0

Couple of questions on Kinesis sequence numbers:

  1. The Kinesis docs state "Sequence numbers for the same partition key generally increase over time". I note the use of the word "generally". Under which circumstances would this ever not be the case? Could we ever consume one message after another from the same shard, where the second message has a smaller sequence number than the first?

  2. If the answer to 1 is yes we can have subsequent messages in a shard with smaller sequence numbers, then what happens in the following scenario?

    Say we consume a batch of records from a single shard in a Kinesis stream, with the following sequence numbers:

    10000 | 10001| 10002 | 10004 | 10003 | 10005

    We successfully process the first 4 records in the batch, and then checkpoint (on seq num 10004). Our application then restarts and so starts consuming from the last checkpoint. Which record do we consume? 10003 or 10005?

2 Answers
1

While it is very rare, there are some edge cases in which the sequence numbers for the same partition key may not increase over time. For example, if a message is delayed due to network issues or server-side errors, it may be assigned a lower sequence number than a message that was sent earlier but arrived later. Similarly, if a message is resent with the same partition key, it may be assigned a different sequence number than the original message.

Review the cautious note in the documentation: "Sequence numbers cannot be used as indexes to sets of data within the same stream. To logically separate sets of data, use partition keys or create a separate stream for each dataset."

In the scenario you described, if the batch of records was processed in order of their sequence numbers, then the application would have processed record 10000, then 10001, then 10002, then 10004, and finally 10003 and 10005. Since the application successfully processed record 10004 and then checkpointed, the next time it starts consuming from that shard, it will start with record 10005. This is because the checkpointing process ensures that the application will not consume any records with a sequence number less than or equal to the one it has already processed. Therefore, in this scenario, the application would consume record 10005 next.

AWS
answered a year ago
  • If that's the case then this sounds like a real problem, potentially risking data loss any time the sequence numbers in a shard are not strictly in ascending order. In the scenario above, this means we would never process record 10003, effectively 'losing' this record. How on earth can we guarantee at-least-once processing with Kinesis then? (without checkpointing every single record)

0

In the scenario you described, if the batch of records was processed in order of their sequence numbers, then the application would have processed record 10000, then 10001, then 10002, then 10004, and finally 10003 and 10005. Since the application successfully processed record 10004 and then checkpointed, the next time it starts consuming from that shard, it will start with record 10005.

But, what happens if the consumer crashes right after consuming and checkpointing 10004, but before consuming 10003? By what you describe, it seems like event 10003 would be lost, since the next time the consumer comes up it would only see 10005.

answered 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions