Does Kafka Security patches causes data Loss


Hi there, We recently got a notification about the Kafka MSK security patches update. We had brokers spread across 3 AZs to avoid data loss according to AWS best practices. Upon the patches completion, we realised the Kafka was shut down during the patches, See the Cloudwatch log below.


here is the Cluster configuration. Please lemme know if I'm missing anything



Hi there, security patching will trigger a rolling reboot. During this time partition leadership moves from one broker to another as brokers are restarted. Clients can get connection errors with a message saying that Connection refused/Timeout errors or leader is not the same as before, but they request metadata again for the correct leader and automatically retry operations against other available brokers. This could manifest as client side latency but does not impact the functionality of the client and wouldn't cause any data loss as long below best practices are followed

  1. Ensuring the topic replication factor (RF) is at least 2 for two-AZ clusters and at least 3 for three-AZ clusters. An RF of 1 can lead to offline partitions during patching.
  2. Set minimum in-sync replicas (minISR) to at most RF - 1 to ensure the partition replica set can tolerate one replica being offline or under-replicated
  3. Ensure clients are configured to use multiple broker connection strings. Having multiple brokers in a client’s connection string allows for failover if a specific broker supporting client I/O begins to be patched.

Since you have 3AZ, please have RF=3 and minISR=2(this is already set to right number in your config). On producer side configuration please make sure you have enough retries set and since it can take few milliseconds for leaders to transfer to another broker, you can set to 50-100ms so that it can wait for few milliseconds before retrying.

