By default, Apache Kafka has a maximum message size limit of 1MB. While large messages are generally considered an anti-pattern in Kafka, there are scenarios where you might need to handle messages larger than this default limit. This guide will walk you through the options available and how to implement them specifically in Amazon MSK.
Approaches to handle large messages
By default, Apache Kafka has a maximum message size limit of 1MB. While large messages are generally considered an anti-pattern in Kafka, there are scenarios where you might need to handle messages larger than this default limit. This guide will walk you through the options available and how to implement them specifically in Amazon MSK.
Compress your large records
Implementing data compression before sending messages to Amazon MSK offers multiple significant benefits. This strategy reduces message sizes, allowing for faster transmission and more efficient use of network bandwidth, while simultaneously decreasing storage costs and optimizing broker resource utilization. It also enhances message retention management across Kafka topics. Popular compression like GZIP, Snappy, and LZ4 can significantly reduce the size of large messages. However, it's important to note that compression does introduce some CPU overhead and latency during message production, and compression effectiveness varies based on the message content and format.
Split one large record into multiple records
When dealing with exceptionally large messages, consider breaking them into smaller chunks at the application level. Each chunk can contain a unique identifier and sequence number to facilitate proper reassembly. The consumer application can then reconstruct the original message by collecting and combining all related segments in the correct order.
Store large records in Amazon S3 with a reference in Kafka
A useful approach for storing large records involves utilizing an alternative storage solution while employing a reference within Kafka. In this context, Amazon S3 stands out as an excellent choice due to its exceptional durability and cost-effectiveness. The procedure involves uploading the record as an object to an S3 bucket and subsequently writing a reference entry in Kafka topic. This entry incorporates an attribute that serves as a pointer, indicating the location of the object within Amazon S3. With this approach, you can generate a pre-signed URL associated with the S3 object’s location. This link can be shared with the consumer, offering direct access to the object without the need for intermediary server-side data transfers.
Increasing message size Limits in Amazon MSK
If you need to send messages larger than 1MB in Amazon MSK, you'll need to modify configurations at multiple levels. Here's how:
1. MSK Cluster Configuration
To modify the cluster-level settings in Amazon MSK (For MSK Standard broker and MSK Express broker) :
- You need to create or update existing cluster configuration and add below configurations (example: 10MB message):
message.max.bytes=10485880
replica.fetch.max.bytes=10485880
- Apply this configuration to your cluster.
However, instead of changing this at the cluster level, it is recommended to change the message.max.bytes at the topic level. You must set the setting replica.fetch.max.bytes in cluster level so that your brokers can replicate the large messages correctly.
2. Topic-Level configuration
You can modify an existing topic with the command below to allow 10MB messages:
kafka-configs.sh --bootstrap-server BOOTSTRAP-SERVER --alter --entity-type topics --entity-name myTopic --add-config max.message.bytes=10485880
3. Producer-Level configuration
You must change the below properties on producer-side to ensure large messages can be sent.
- max.request.size: The max.request.size parameter defines the maximum allowed message size, preventing failed sends due to size limits. This is crucial for applications that send large payloads.
- buffer.memory: The buffer.memory(Default:32MB) parameter controls the total memory allocated to the producer. It affects the producer's ability to handle large messages and should be sized appropriately based on message volume.
4. Consumer-Level configuration
You must change the below properties on consumer-side to ensure large messages can be consumed.
- fetch.max.bytes: The fetch.max.bytes (Default:55MB) parameter limits the amount of data a consumer can retrieve in a single request. It is configurable based on expected message sizes and is crucial for applications that consume large messages..
- max.partition.fetch.bytes: The max.partition.fetch.bytes(Default: 1MB) parameter controls the maximum data fetch size per partition. It helps prevent processing issues with large individual messages and should be sized to accommodate the largest expected messages.
Conclusion
While Amazon MSK offers flexibility in handling larger message sizes, the decision to increase message size limits requires careful consideration of multiple factors. The potential performance impact, increased costs, and infrastructure demands must be weighed against operational requirements. Rather than simply increasing message size limits, organizations should first explore alternative solutions such as message compression or S3 integration. A methodical approach including thorough testing, comprehensive monitoring, and adherence to best practices is essential for successful implementation. Ultimately, the primary goal should be to maintain optimal cluster performance while meeting business needs when modifying message size configurations in Amazon MSK.