Skip to content

What is the right way to publish events to MSK Topic from SQS for High Volume and Throughput use case

0

What is the right way to publish events to MSK Topic from SQS for High Volume and Throughput use case. I am exploring two approaches

  1. Lambda - However it becomes costly at higher scale
  2. MSK Connect: There is no official version available from AWS. There is an open-source code in java which needs to be vetted for enterprise environment.
    Please share your thoughts.
3 Answers
1

For high volume and throughput use cases when publishing events from Amazon SQS to an MSK topic, both Lambda and MSK Connect have their pros and cons. Let's examine each approach:

  1. AWS Lambda: Lambda can indeed be used to consume messages from SQS and publish them to MSK. With recent improvements, Lambda now supports up to 5x faster polling scale-up rate for SQS event sources, allowing up to 300 concurrent executions per minute. This enhancement enables more responsive scaling during sudden bursts of messages in SQS queues.

However, as you correctly pointed out, Lambda can become costly at higher scales, especially for consistently high-volume workloads. While it offers excellent scalability and ease of use, the per-invocation pricing model may not be the most cost-effective for very high-throughput scenarios.

  1. MSK Connect: MSK Connect is indeed a viable option for connecting SQS to MSK. While there isn't an official AWS-managed connector specifically for SQS to MSK, you can use open-source connectors or develop custom ones.

The open-source Java-based connector you mentioned could be a good starting point. However, as you noted, it would need to be thoroughly vetted and possibly customized for an enterprise environment. This approach gives you more control but requires more effort in terms of development, testing, and maintenance.

Considering your high volume and throughput requirements, MSK Connect might be the better choice for long-term, consistent high-volume data transfer. It's designed for scalable, reliable data streaming and can handle high throughput more cost-effectively than Lambda for sustained workloads.

That said, the right choice depends on your specific use case, including:

  1. Exact volume and throughput requirements
  2. Consistency of the workload (steady high volume vs. occasional spikes)
  3. Available resources for development and maintenance
  4. Specific latency requirements
  5. Total cost of ownership considerations

If your workload is consistently high volume, MSK Connect would likely be more cost-effective and scalable in the long run, despite the initial development effort. If your workload is more variable with occasional spikes, Lambda's new scaling capabilities might be sufficient and easier to implement.

You might also consider a hybrid approach, using Lambda for lower-volume periods or for rapid prototyping, and transitioning to MSK Connect as your volume grows or for your most high-throughput use cases.

Ultimately, it's recommended to prototype both approaches with your actual workload to determine which best meets your performance, cost, and operational requirements.
Sources
AWS Lambda supports faster polling scale-up rate for Amazon SQS as an event source
Use cases - Streaming Data Solution for Amazon MSK

answered a year ago
1

I just checked out this open source project: https://github.com/Nordstrom/kafka-connect-sqs/tree/master

I reviewed the code, and it seems to support high concurrency using provisioned tasks per running connector.

SQS supports about 120K inflight messages with standard queues. This connector supports up to 10 per task per pull. Technically, you could run hundreds of tasks. But I suggest to run performance test to determine instance type (MCUs) for MSK Connect deployment and scale.

AWS
answered a year ago
0

Hi,

Did you envision an EventBridge Pipe: https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-pipes.html

SQS is a possible pipe source : see https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-pipes-sqs.html

Kinesis is possible target: https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-pipes-event-target.html

I don't know of a direct MSK connection for MSK as a target but you can try to use Kafka-Kinesis-Connector : see https://repost.aws/knowledge-center/kinesis-kafka-connector-msk

EXPERT
answered a year ago
  • Thanks for Response Didier. However, Kinesis Kafka connector seems to be other way. Kafka is source and Kinesis is Target. I need to publish messages in MSK.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.