Data loss while MSK is in an HEALING state.

0

Hi there, We had about 10 mins downtime today while MSK was in a HEALING state. But according to the doc available online. A healing state should not affect the cluster to produce or consume. Is there a reason why we had downtime? Below is our cluster configuration

auto.create.topics.enable=true
default.replication.factor=3
min.insync.replicas=2
num.io.threads=8
num.network.threads=5
num.partitions=1
num.replica.fetchers=2
replica.lag.time.max.ms=30000
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
unclean.leader.election.enable=true
zookeeper.session.timeout.ms=18000
log.retention.hours=-1

I found these logs on the consumer side.

"error":"Messages are rejected since there are fewer in-sync replicas than required","correlationId":5,"size":57}

Kindly advise if anything is wrong with the configuration. Thanks

profile picture
demandé il y a 2 ans649 vues
1 réponse
0

Hi there, how many brokers you have? If the following best practices are followed (1) it wouldn't cause any downtime when one broker is down. When one broker is down it is possible a partition will have one less replica then it is configured to have. So that is the reason replication factor should be greater than min.insync.replicas by 1 i.e RF = minISR+1. Based on the above error, it seems you have acks=ALL which requires atleast 2 replicas to be insync (based on min.insync.replicas=2 in your config) but seems less than 2 replicas were in sync at the time so the error.

Even though default.replication.factor=3, it only applies to auto created topics. If you create topics manually you will have to specify replication factor while topic creation. Please make sure RF for topics is 3 if it is 3AZ cluster. If it is 2AZ cluster please decrease the value of min.insync.replicas to 1.

(1) https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html#ensure-high-availability

AWS
INGÉNIEUR EN ASSISTANCE TECHNIQUE
répondu il y a 2 ans

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.

Instructions pour répondre aux questions