Ordered message delivery to downstream consumers when transient faults occur

0

Hi,

We are architecting a solution, and considering Kinesis. One of the primary reasons is the guaranteed message ordering per shard (SQS Fifo is too slow). Our likely exception handling process would be:

  1. Invocation faults get handled as per event source mapping policy
  2. Application non-transient faults (invalid payload) get caught and manually forwarded to SQS to prevent pointless retries
  3. Application transient faults are uncaught, and event source mapping policy will be configured with maximum message age (e.g. 5 minutes) and then move failed batch metadata to SQS (awaiting replay) and continue.

So, IF a network / db fault occurs, the shard will be blocked on the current message until the message age expires. If that happens, then messages can be delivered to downstream consumers (by our lambda) out of order.

The best we can achieve is to provide something like an SLA that can be considered / designed around for each consumer?

Have I missed anything obvious in my conclusion?

Tia

질문됨 2년 전231회 조회
1개 답변
0

A thought based on the question: Kinesis and SQS operate differently. So when you say "the shard will be blocked" - individual consumers on the shard might choose not to consume the next message in the stream but there's no concept of "blocking". Unlike SQS, messages in the stream are visible to all of the consumers so they can choose what they're going to do with each message - which is great if you have several different processes that need to happen on a single message - you can use different consumers and they don't get in each other's way.

So in the Kinesis world, if you want to maintain ordering you can only have a single consumer on the stream (shard, really). If there is a fault the blocking happens in the consumer, not in Kinesis.

Probably not helpful - but I'd question why Kinesis is better than SQS FIFO in this case.

Finally: I'm a little concerned about the comment "SQS FIFO is too slow" - have you tested Kinesis to ensure that it meets your performance requirements?

Given the complexity of the question and the challenges you appear to be facing I'd contact your local AWS Solutions Architect to discuss further...

profile pictureAWS
전문가
답변함 2년 전
  • Hi there, thanks for replying.

    Regsarding SQS Fifo being too slow, it's bee a long time since I checked FIFO throughput. Checking now, 3000msg/sec is a lot higher than it was, so probably not a concern any longer, but i'd like to focus on Kinesis. Yes, we have run soak tests through Kinesis and we're very happy with throughput.

    Regarding "shard will be blocked" comment - Of course, not the shard itself, i meant the consumer will not be able to continue receiving new batches.

    You mention "So in the Kinesis world, if you want to maintain ordering you can only have a single consumer on the stream (shard, really)." this is not what i understand? If multiple consumers are reading from a single shard, they are all receiving the same data completely independantly from other consumers, but the shard guarantees order, so they all get ordered data?

    Finally, I have reached out to AWS locally - it's just taking some time and i was looking to expedite.

    Cheers

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠