7回答
- 新しい順
- 投票が多い順
- コメントが多い順
0
I restored our application to use our own self-hosted Kafka cluster (on shared t3.medium instances).
It is now 12h later, and the MSK cluster was left to itself over night. It is still timing out on two of the three brokers.
@AWS: I would have expected by now that at least your monitoring would have noticed an issue, and would have restarted the failing brokers. Please advise on how to proceed.
b-2.….kafka.eu-west-1.amazonaws.com:9092 (id: 2 rack: subnet-0f50b40038ca2bf62) -> (
Produce(0): 0 to 7 [usable: 7],
Fetch(1): 0 to 10 [usable: 10],
ListOffsets(2): 0 to 4 [usable: 4],
Metadata(3): 0 to 7 [usable: 7],
LeaderAndIsr(4): 0 to 1 [usable: 1],
StopReplica(5): 0 [usable: 0],
UpdateMetadata(6): 0 to 4 [usable: 4],
ControlledShutdown(7): 0 to 1 [usable: 1],
OffsetCommit(8): 0 to 6 [usable: 6],
OffsetFetch(9): 0 to 5 [usable: 5],
FindCoordinator(10): 0 to 2 [usable: 2],
JoinGroup(11): 0 to 3 [usable: 3],
Heartbeat(12): 0 to 2 [usable: 2],
LeaveGroup(13): 0 to 2 [usable: 2],
SyncGroup(14): 0 to 2 [usable: 2],
DescribeGroups(15): 0 to 2 [usable: 2],
ListGroups(16): 0 to 2 [usable: 2],
SaslHandshake(17): 0 to 1 [usable: 1],
ApiVersions(18): 0 to 2 [usable: 2],
CreateTopics(19): 0 to 3 [usable: 3],
DeleteTopics(20): 0 to 3 [usable: 3],
DeleteRecords(21): 0 to 1 [usable: 1],
InitProducerId(22): 0 to 1 [usable: 1],
OffsetForLeaderEpoch(23): 0 to 2 [usable: 2],
AddPartitionsToTxn(24): 0 to 1 [usable: 1],
AddOffsetsToTxn(25): 0 to 1 [usable: 1],
EndTxn(26): 0 to 1 [usable: 1],
WriteTxnMarkers(27): 0 [usable: 0],
TxnOffsetCommit(28): 0 to 2 [usable: 2],
DescribeAcls(29): 0 to 1 [usable: 1],
CreateAcls(30): 0 to 1 [usable: 1],
DeleteAcls(31): 0 to 1 [usable: 1],
DescribeConfigs(32): 0 to 2 [usable: 2],
AlterConfigs(33): 0 to 1 [usable: 1],
AlterReplicaLogDirs(34): 0 to 1 [usable: 1],
DescribeLogDirs(35): 0 to 1 [usable: 1],
SaslAuthenticate(36): 0 [usable: 0],
CreatePartitions(37): 0 to 1 [usable: 1],
CreateDelegationToken(38): 0 to 1 [usable: 1],
RenewDelegationToken(39): 0 to 1 [usable: 1],
ExpireDelegationToken(40): 0 to 1 [usable: 1],
DescribeDelegationToken(41): 0 to 1 [usable: 1],
DeleteGroups(42): 0 to 1 [usable: 1]
)
b-1.….kafka.eu-west-1.amazonaws.com:9092 (id: 1 rack: subnet-0e2b1a418025f9b10) -> ERROR: org.apache.kafka.common.errors.DisconnectException
b-3.….kafka.eu-west-1.amazonaws.com:9092 (id: 3 rack: subnet-000880b697986a0d8) -> ERROR: org.apache.kafka.common.errors.DisconnectException
回答済み 5年前
0
Hi Ankon we are looking into this issue. Did you cut a trouble ticket with our support team?
回答済み 5年前
0
I got a reply from AWS support now, esssentially saying:
- This is a known issue with Kafka 2.1.0, it’s a deadlock described in https://issues.apache.org/jira/browse/KAFKA-7697
- They claim that they do work-around that issue by restarting affected brokers (see https://forums.aws.amazon.com/thread.jspa?messageID=895763#895763)
- They suggest using 1.1.1 for production, and advise against using 2.1.0 in production.
- While they are investigating fix options including providing newer Kafka versions, there are no ETAs.
I hope this helps anyone else reading through this forum.
回答済み 5年前
0
Thanks for the follow up. We are working on supporting 2.2.1 which does not have this known issue.
回答済み 5年前
関連するコンテンツ
- AWS公式更新しました 1年前