Data loss while MSK is in an HEALING state.

0

Hi there, We had about 10 mins downtime today while MSK was in a HEALING state. But according to the doc available online. A healing state should not affect the cluster to produce or consume. Is there a reason why we had downtime? Below is our cluster configuration

auto.create.topics.enable=true
default.replication.factor=3
min.insync.replicas=2
num.io.threads=8
num.network.threads=5
num.partitions=1
num.replica.fetchers=2
replica.lag.time.max.ms=30000
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
unclean.leader.election.enable=true
zookeeper.session.timeout.ms=18000
log.retention.hours=-1

I found these logs on the consumer side.

"error":"Messages are rejected since there are fewer in-sync replicas than required","correlationId":5,"size":57}

Kindly advise if anything is wrong with the configuration. Thanks

profile picture
asked 2 years ago641 views
1 Answer
0

Hi there, how many brokers you have? If the following best practices are followed (1) it wouldn't cause any downtime when one broker is down. When one broker is down it is possible a partition will have one less replica then it is configured to have. So that is the reason replication factor should be greater than min.insync.replicas by 1 i.e RF = minISR+1. Based on the above error, it seems you have acks=ALL which requires atleast 2 replicas to be insync (based on min.insync.replicas=2 in your config) but seems less than 2 replicas were in sync at the time so the error.

Even though default.replication.factor=3, it only applies to auto created topics. If you create topics manually you will have to specify replication factor while topic creation. Please make sure RF for topics is 3 if it is 3AZ cluster. If it is 2AZ cluster please decrease the value of min.insync.replicas to 1.

(1) https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html#ensure-high-availability

AWS
SUPPORT ENGINEER
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions