Data loss while MSK is in an HEALING state.

0

Hi there, We had about 10 mins downtime today while MSK was in a HEALING state. But according to the doc available online. A healing state should not affect the cluster to produce or consume. Is there a reason why we had downtime? Below is our cluster configuration

auto.create.topics.enable=true
default.replication.factor=3
min.insync.replicas=2
num.io.threads=8
num.network.threads=5
num.partitions=1
num.replica.fetchers=2
replica.lag.time.max.ms=30000
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
unclean.leader.election.enable=true
zookeeper.session.timeout.ms=18000
log.retention.hours=-1

I found these logs on the consumer side.

"error":"Messages are rejected since there are fewer in-sync replicas than required","correlationId":5,"size":57}

Kindly advise if anything is wrong with the configuration. Thanks

profile picture
已提問 2 年前檢視次數 649 次
1 個回答
0

Hi there, how many brokers you have? If the following best practices are followed (1) it wouldn't cause any downtime when one broker is down. When one broker is down it is possible a partition will have one less replica then it is configured to have. So that is the reason replication factor should be greater than min.insync.replicas by 1 i.e RF = minISR+1. Based on the above error, it seems you have acks=ALL which requires atleast 2 replicas to be insync (based on min.insync.replicas=2 in your config) but seems less than 2 replicas were in sync at the time so the error.

Even though default.replication.factor=3, it only applies to auto created topics. If you create topics manually you will have to specify replication factor while topic creation. Please make sure RF for topics is 3 if it is 3AZ cluster. If it is 2AZ cluster please decrease the value of min.insync.replicas to 1.

(1) https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html#ensure-high-availability

AWS
支援工程師
已回答 2 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南