- Newest
- Most votes
- Most comments
While it is very rare, there are some edge cases in which the sequence numbers for the same partition key may not increase over time. For example, if a message is delayed due to network issues or server-side errors, it may be assigned a lower sequence number than a message that was sent earlier but arrived later. Similarly, if a message is resent with the same partition key, it may be assigned a different sequence number than the original message.
Review the cautious note in the documentation: "Sequence numbers cannot be used as indexes to sets of data within the same stream. To logically separate sets of data, use partition keys or create a separate stream for each dataset."
In the scenario you described, if the batch of records was processed in order of their sequence numbers, then the application would have processed record 10000, then 10001, then 10002, then 10004, and finally 10003 and 10005. Since the application successfully processed record 10004 and then checkpointed, the next time it starts consuming from that shard, it will start with record 10005. This is because the checkpointing process ensures that the application will not consume any records with a sequence number less than or equal to the one it has already processed. Therefore, in this scenario, the application would consume record 10005 next.
In the scenario you described, if the batch of records was processed in order of their sequence numbers, then the application would have processed record 10000, then 10001, then 10002, then 10004, and finally 10003 and 10005. Since the application successfully processed record 10004 and then checkpointed, the next time it starts consuming from that shard, it will start with record 10005.
But, what happens if the consumer crashes right after consuming and checkpointing 10004, but before consuming 10003? By what you describe, it seems like event 10003 would be lost, since the next time the consumer comes up it would only see 10005.
Relevant content
- asked a year ago
- asked 4 years ago
- asked 8 months ago
- asked a year ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated 2 years ago
If that's the case then this sounds like a real problem, potentially risking data loss any time the sequence numbers in a shard are not strictly in ascending order. In the scenario above, this means we would never process record 10003, effectively 'losing' this record. How on earth can we guarantee at-least-once processing with Kinesis then? (without checkpointing every single record)