Data loss while MSK is in an HEALING state.
Hi there, We had about 10 mins downtime today while MSK was in a HEALING state. But according to the doc available online. A healing state should not affect the cluster to produce or consume. Is there a reason why we had downtime? Below is our cluster configuration
auto.create.topics.enable=true default.replication.factor=3 min.insync.replicas=2 num.io.threads=8 num.network.threads=5 num.partitions=1 num.replica.fetchers=2 replica.lag.time.max.ms=30000 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 socket.send.buffer.bytes=102400 unclean.leader.election.enable=true zookeeper.session.timeout.ms=18000 log.retention.hours=-1
I found these logs on the consumer side.
"error":"Messages are rejected since there are fewer in-sync replicas than required","correlationId":5,"size":57}
Kindly advise if anything is wrong with the configuration. Thanks
Hi there, how many brokers you have? If the following best practices are followed (1) it wouldn't cause any downtime when one broker is down. When one broker is down it is possible a partition will have one less replica then it is configured to have. So that is the reason replication factor should be greater than min.insync.replicas by 1 i.e RF = minISR+1. Based on the above error, it seems you have acks=ALL which requires atleast 2 replicas to be insync (based on min.insync.replicas=2 in your config) but seems less than 2 replicas were in sync at the time so the error.
Even though default.replication.factor=3, it only applies to auto created topics. If you create topics manually you will have to specify replication factor while topic creation. Please make sure RF for topics is 3 if it is 3AZ cluster. If it is 2AZ cluster please decrease the value of min.insync.replicas to 1.
MSK Connect - Exposing Connector Specific Metricsasked 5 months ago
Storage Capacity on MSK maxed outasked a month ago
Data loss while MSK is in an HEALING state.asked 2 months ago
Does Updating an Active MSK cause data lossAccepted Answerasked 6 months ago
Mac EC2 instance stuck in "shutting-down" stateasked 6 months ago
java.nio.channels.UnresolvedAddressException when trying to create a topic in Amazon MSKasked 6 months ago
RDS instance stuck in Deleting stateasked 2 years ago
Dedicated Mac1.metal is in pending state after instance terminatedasked 9 months ago
Instance stuck in Stopping state.asked 3 years ago
How to restore WorkSpaces to a previous stateAccepted Answerasked 5 months ago