How do I troubleshoot errors I receive when I try to connect to my Amazon MSK cluster?
I receive errors when I try to connect to my Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster.
Resolution
Errors that are not related to a specific authentication type
When you try to connect to your Amazon MSK cluster, you might receive one of the following errors that's not related to the authentication type that you use.
java.lang.OutOfMemoryError: Java heap space
You receive the OutOfMemoryError error when you run a command for cluster operations without the client properties file. To resolve this issue, include the appropriate properties based on the type of authentication in the client.properties file.
Example command with only an AWS Identity and Access Management (IAM) authentication port:
./kafka-topics.sh --create --bootstrap-server $BOOTSTRAP:9098 --replication-factor 3 --partitions 1 --topic TestTopic
Example command with an IAM authentication port and the client properties file:
./kafka-topics.sh --create --bootstrap-server $BOOTSTRAP:9098 --command-config client.properties --replication-factor 3 --partitions 1 --topic TestTopic
org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: createTopics
You might receive the TimeoutException error when there's a network misconfiguration between the client application and Amazon MSK cluster.
To troubleshoot this issue, run the following connectivity test from the client machine:
telnet bootstrap-broker port-number
Note: Replace bootstrap-broker with one of the broker addresses from your Amazon MSK cluster. Replace port-number with the appropriate port value based on the authentication that's turned on for your cluster.
If the client machine can access the brokers, then there are no connectivity issues. If the client machine can't access the brokers, then review the network connectivity. Check the inbound and outbound rules for the security group.
org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [test_topic]
You receive the TopicAuthorizationException error when you use an IAM authentication and your access policy blocks topic operations, such as WriteData and ReadData.
Note: Permission boundaries and service control policies (SPCs) also block a user's attempt to connect to the cluster without the required authorization.
If you use an authentication that's not IAM, then check whether you added topic-level access control lists (ACLs) that block operations.
Run the following command to list the ACLs that are applied on a topic:
bin/kafka-acls.sh --bootstrap-server $BOOTSTRAP:PORT --command-config adminclient-configs.conf --list --topic testtopic
ZooKeeperClientTimeoutException
You might receive the ZooKeeperClientTimeoutException error when the client tries to connect to the cluster through the Apache ZooKeeper string, and the connection can't be established. You might also receive this error when the Apache ZooKeeper string is incorrect.
Example of an incorrect Apache Zookeeper string:
./kafka-topics.sh --zookeeper z-1.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:2181,z-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:2181,z-3.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:2181 --list[2020-04-10 23:58:47,963] WARN Client session timed out, have not heard from server in 10756ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn)
Example output:
[2020-04-10 23:58:58,581] WARN Client session timed out, have not heard from server in 10508ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn) [2020-04-10 23:59:08,689] WARN Client session timed out, have not heard from server in 10004ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn) Exception in thread "main" kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:259) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253) at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:255) at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:113) at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1858) at kafka.admin.TopicCommand$ZookeeperTopicService$.apply(TopicCommand.scala:321) at kafka.admin.TopicCommand$.main(TopicCommand.scala:54) at kafka.admin.TopicCommand.main(TopicCommand.scala)
To resolve this issue, take the following actions:
- Verify that you're using the correct Apache ZooKeeper string.
- Make sure that the security group for your Amazon MSK cluster allows inbound traffic from the client's security group on the Apache ZooKeeper ports.
Broker might be unavailable
"Topic 'topicName' not present in metadata after 60000 ms. or Connection to node -<node-id> (<broker-host>/<broker-ip>:<port>) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)"
You might receive the preceding error when one of the following are true:
- The producer or consumer can't connect to the broker host and port.
- The broker string is incorrect.
If you receive this error even though the client or broker connectivity initially worked, then the broker might unavailable.
You might also receive this error when you use the broker string to produce data to access the cluster from outside the virtual private cloud (VPC).
Example of producer broker string:
./kafka-console-producer.sh --broker-list b-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9092,b-1.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9092 --topic test
Example output:
[2020-04-10 23:51:57,668] ERROR Error when sending message to topic test with key: null, value: 1 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.TimeoutException: Topic test not present in metadata after 60000 ms.
Example for consumer broker string:
./kafka-console-consumer.sh --bootstrap-server b-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9092,b-1.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9092 --topic test
Example output:
[2020-04-11 00:03:21,157] WARN [Consumer clientId=consumer-console-consumer-88994-1, groupId=console-consumer-88994] Connection to node -1 (b-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com/172.31.6.19:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2020-04-11 00:04:36,818] WARN [Consumer clientId=consumer-console-consumer-88994-1, groupId=console-consumer-88994] Connection to node -2 (b-1.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com/172.31.44.252:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient) [2020-04-11 00:05:53,228] WARN [Consumer clientId=consumer-console-consumer-88994-1, groupId=console-consumer-88994] Connection to node -1 (b-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com/172.31.6.19:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
To troubleshoot the issue, take the following actions:
- Make sure that you use the correct broker string and port.
- If the broker is unavailable, then check the ActiveControllerCount Amazon CloudWatch metric to verify that the controller was active during the period. If the metric's value isn't 1, then one of the brokers in the cluster might be unavailable.
- Check the ZooKeeperSessionState metric to confirm that the brokers are in continual communication with the Apache ZooKeeper nodes.
- To understand why the broker failed, check the KafkaDataLogsDiskUsed metric to determine whether the broker ran out of storage space. For more information, see Amazon MSK metrics for monitoring Standard brokers with CloudWatch.
- Check whether the network configuration caused the issue. Amazon MSK resources are provisioned within the VPC. You must connect to the Amazon MSK cluster or produce and use from the cluster over a private network in the same VPC. For information, see Unable to access cluster from within AWS: networking issues and How do I connect to my Amazon MSK cluster from inside AWS network but outside the cluster's Amazon VPC?
Topic not present in metadata
"org.apache.kafka.common.errors.TimeoutException: Topic test not present in metadata after 60000 ms"
You receive the preceding error when the topic you're trying to write to doesn't exist in Amazon MSK. Check whether the topic exists in your Amazon MSK cluster. Verify that you used the correct broker string and port in your client configuration. If the topic doesn't exist, then either create the topic in Amazon MSK, or set auto.create.enable to true in your cluster configuration. When auto.create.enable is set to true, topics are automatically created.
You might also receive this error when the topic exists but the partition doesn't. For example, you have a single partition[0] and your producer tries to send to partition[1].
Make sure that your Amazon MSK cluster's security group allows inbound traffic from the security group of the client application on the appropriate ports.
If the error suddenly occurs after the system was previously working, then take the following actions to check the status of your Amazon MSK brokers:
- Check the ActiveControllerCount metric. The value must be 1. When the metric has any other value, one of the brokers in the cluster is unavailable.
- Check the ZooKeeprSessionState metric to confirm that the brokers are in continual communication with the ZooKeeper nodes.
- Monitor the KafkaDataLogsDiskUsed metric to make sure that the broker didn't run out of storage space.
Verify that you didn't try to access the cluster from outside the VPC without the correct configuration. By default, Amazon MSK resources are provisioned within the VPC. You must connect over a private network in the same VPC.
If you're trying to access the cluster from outside the VPC, then make sure you set up the necessary networking configurations, such as AWS Client VPN or AWS Direct Connect.
Incorrect configuration of Kafka client Producer or consumer
To resolve the incorrect configuration of a Kafka client producer or consumer, verify that your client's configuration includes the correct bootstrap servers. Also confirm that the configuration includes the necessary security settings and the compatibility versions of Kafka-client and Spring Boot.
Errors that are specific to TLS client authentication
Bootstrap broker is disconnected
"Bootstrap broker <broker-host>:9094 (id: -<broker-id> rack: null) disconnected"
You might receive the preceding error when you try to connect to a cluster that has SSL/TLS client authentication turned on.
You might also receive this error when the producer or consumer tries to connect to an SSL/TLS-encrypted cluster over TLS port 9094 and doesn't pass the SSL/TLS configuration. To resolve this issue, set up the SSL/TLS configuration.
In the following example, an error occurs when the producer tries to connect to the cluster:
./kafka-console-producer.sh --broker-list b-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9094,b-1.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9094 --topic test[2020-04-10 18:57:58,019] WARN [Producer clientId=console-producer] Bootstrap broker b-1.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9094 (id: -2 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
Example output:
[2020-04-10 18:57:58,342] WARN [Producer clientId=console-producer] Bootstrap broker b-1.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9094 (id: -2 rack: null) disconnected (org.apache.kafka.clients.NetworkClient) [2020-04-10 18:57:58,666] WARN [Producer clientId=console-producer] Bootstrap broker b-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9094 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
In the following example, an error occurs when the consumer tries to connect to the cluster:
./kafka-console-consumer.sh --bootstrap-server b-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9094,b-1.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9094 --topic test[2020-04-10 19:09:03,277] WARN [Consumer clientId=consumer-console-consumer-79102-1, groupId=console-consumer-79102] Bootstrap broker b-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9094 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
Example output:
[2020-04-10 19:09:03,596] WARN [Consumer clientId=consumer-console-consumer-79102-1, groupId=console-consumer-79102] Bootstrap broker b-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9094 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient) [2020-04-10 19:09:03,918] WARN [Consumer clientId=consumer-console-consumer-79102-1, groupId=console-consumer-79102] Bootstrap broker b-1.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9094 (id: -2 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
If client authentication is turned on for your cluster, then you must include additional parameters for your AWS Private Certificate Authority. For more information, see Mutual TLS client authentication for Amazon MSK.
Key store access error
"ERROR Modification time of key store could not be obtained: <configure-path-to-truststore>"
-or-
"Failed to load keystore"
You might receive the preceding errors when you incorrectly configure the truststore and load the truststore files for the producer and consumer. To resolve this issue, provide the correct path for the truststore file in the SSL/TLS configuration.
Example of an incorrect consumer broker string:
./kafka-console-consumer --bootstrap-server b-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9094,b-1.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9094 --topic test --consumer.config /home/ec2-user/ssl.config
Example output:
[2020-04-11 10:39:12,194] ERROR Modification time of key store could not be obtained: /home/ec2-ser/certs/kafka.client.truststore.jks (org.apache.kafka.common.security.ssl.SslEngineBuilder) java.nio.file.NoSuchFileException: /home/ec2-ser/certs/kafka.client.truststore.jks [2020-04-11 10:39:12,253] ERROR Unknown error when running consumer: (kafka.tools.ConsoleConsumer$) Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /home/ec2-ser/certs/kafka.client.truststore.jks of type JKS
This error might also occur when your truststore or key store file is corrupted or the truststore file password is incorrect.
SSL/TLS handshake failure
"Error when sending message to topic test with key: null, value: 0 bytes with error (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failed"
-or-
"Connection to node -<broker-id> (<broker-hostname>/<broker-hostname>:9094) failed authentication due to: SSL handshake failed (org.apache.kafka.clients.NetworkClient)"
You might receive one of the preceding errors when you incorrectly configured the producer's or consumer's key store and an authentication failure occurs. Make sure that you correctly configure the key store.
Example of incorrect broker string for the producer's key store:
./kafka-console-producer --broker-list b-2.tlscluster.5818ll.c7.kafka.us-east-1.amazonaws.com:9094,b-1.tlscluster.5818ll.c7.kafka.us-east-1.amazonaws.com:9094,b-4.tlscluster.5818ll.c7.kafka.us-east-1.amazonaws.com:9094 --topic example --producer.config/home/ec2-user/ssl.config
Example output:
[2020-04-11 11:13:19,286] ERROR [Producer clientId=console-producer] Connection to node -3 (b-4.tlscluster.5818ll.c7.kafka.us-east-1.amazonaws.com/172.31.6.195:9094) failed authentication due to: SSL handshake failed (org.apache.kafka.clients.NetworkClient)
Example of incorrect broker string for the consumer's key store:
./kafka-console-consumer --bootstrap-server b-2.tlscluster.5818ll.c7.kafka.us-east-1.amazonaws.com:9094,b-1.tlscluster.5818ll.c7.kafka.us-east-1.amazonaws.com:9094,b-4.tlscluster.5818ll.c7.kafka.us-east-1.amazonaws.com:9094 --topic example --consumer.config/home/ec2-user/ssl.config
Example output:
[2020-04-11 11:14:46,958] ERROR [Consumer clientId=consumer-1, groupId=console-consumer-46876] Connection to node -1 (b-2.tlscluster.5818ll.c7.kafka.us-east-1.amazonaws.com/172.31.15.140:9094) failed authentication due to: SSL handshake failed (org.apache.kafka.clients.NetworkClient) [2020-04-11 11:14:46,961] ERROR Error processing message, terminating consumer process: (kafka.tools.ConsoleConsumer$) org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failed
keystore password is incorrect
"java.io.IOException: keystore password was incorrect"
You might receive the preceding error when the password for the key store or truststore is incorrect.
To troubleshoot this issue, run the following command to check whether the key store or truststore password is correct:
keytool -list -keystore kafka.client.keystore.jksEnter keystore password: Keystore type: PKCS12 Keystore provider: SUN Your keystore contains 1 entry schema-reg, Jan 15, 2020, PrivateKeyEntry, Certificate fingerprint (SHA1): 4A:F3:2C:6A:5D:50:87:3A:37:6C:94:5E:05:22:5A:1A:D5:8B:95:ED
If the password for the key store or truststore is incorrect, then you receive the following error:
"keytool error: java.io.IOException: keystore password was incorrect"
To view the verbose output of the previous command add the -v flag:
keytool -list -v -keystore kafka.client.keystore.jks
You can also run the preceding commands to check whether the key store is corrupted.
You might also receive this error when you incorrectly configure the secret key that's associated with the alias in the producer and consumer SSL/TLS configuration. To check whether this is the issue, run the following command:
keytool -keypasswd -alias schema-reg -keystore kafka.client.keystore.jks
If your password for the alias's secret is correct, then you're asked to enter a new password for the secret key:
Enter keystore password: Enter key password for <schema-reg> New key password for <schema-reg>: Re-enter new key password for <schema-reg>:
Otherwise, the command fails with the following message:
"keytool error: java.security.UnrecoverableKeyException: Get Key failed: Given final block not properly padded. Such issues can arise if a bad key is used during decryption."
To verify whether an alias is part of the key store, run the following command:
keytool -list -keystore kafka.client.keystore.jks -alias schema-reg
Example output:
Enter keystore password: schema-reg, Jan 15, 2020, PrivateKeyEntry, Certificate fingerprint (SHA1): 4A:F3:2C:6A:5D:50:87:3A:37:6C:94:5E:05:22:5A:1A:D5:8B:95:ED
Errors that are specific to an IAM client authentication
Failed authentication, access denied
"Connection to node -1 (b-1.testcluster.abc123.c2.kafka.us-east-1.amazonaws.com/10.11.111.123:9098) failed authentication due to: Access denied"
-or-
"org.apache.kafka.common.errors.SaslAuthenticationException: Access denied"
You receive one of the preceding errors when access policies, permission boundaries, and SCPs block users who don't pass the required authorization.
To resolve this issue, use IAM access control to make sure that the IAM role can perform cluster operations.
SaslAuthenticationException
"org.apache.kafka.common.errors.SaslAuthenticationException: Too many connects"
-or-
"org.apache.kafka.common.errors.SaslAuthenticationException: Internal error"
You receive the preceding errors when your run your cluster on the kafka.t3.small broker type with IAM access control and you exceed the connection quota. The kafka.t3.small instance type accepts only one TCP connection for each broker per second. When you exceed the connection quota, your creation test fails. For more information, see How Amazon MSK works with IAM.
To resolve these issues, take the following actions:
- In your Amazon MSK Connect worker configuration, update the values for reconnect.backoff.ms and reconnect.backoff.max.ms to 1000 or higher.
- Upgrade to a larger broker instance type, such as kafka.m5.large. For more information, see Right-size your cluster: Number of partitions per Standard broker.
Errors that are specific to SASL/SCRAM client authentication
Client SASL mechanism is turned off
"Connection to node -1 (b-1-testcluster.abc123.c7.kafka.us-east-1.amazonaws.com/3.11.111.123:9098) failed authentication due to: Client SASL mechanism 'SCRAM-SHA-512' not enabled in the server, enabled mechanisms are [AWS_MSK_IAM]"
-or-
"Connection to node -1 (b-1-testcluster.abc123.c7.kafka.us-east-1.amazonaws.com/3.11.111.123:9096) failed authentication due to: Client SASL mechanism 'AWS_MSK_IAM' not enabled in the server, enabled mechanisms are [SCRAM-SHA-512]"
You receive the preceding errors when the port number doesn't match the Simple Authentication and Security Layer (SASL) mechanism in the client properties file. This is the properties file that you use in the command to run cluster operations.
To communicate with brokers in a cluster that uses Simple Authentication and Security Layer/Salted Challenge Response Authentication Mechanism (SASL/SCRAM), use the following ports:
- Port 9096 for access from within AWS
- Port 9196 for public access
To communicate with brokers in a cluster that uses IAM access control, use ports 9098 for access from within AWS and port 9198 for public access.
SASL credential authentication error
"Connection to node -1 (b-3.testcluster.abc123.c2.kafka.us-east-1.amazonaws.com/10.11.111.123:9096) failed authentication due to: Authentication failed during authentication due to invalid credentials with SASL mechanism SCRAM-SHA-512"
Make sure that you stored the user credentials in AWS Secrets Manager and associated the credentials with the Amazon MSK cluster.
When you access the cluster over port 9096, the user and password in AWS Secrets Manager must be the same as the client properties.
When you run the get-secret-value command to retrieve the secrets, make sure that the password in AWS Secrets Manager doesn't contain any special characters.
ClusterAuthorizationException
"org.apache.kafka.common.errors.ClusterAuthorizationException: Request Request(processor=11, connectionId=INTERNAL_IP-INTERNAL_IP-0, session=Session(User:ANONYMOUS,/INTERNAL_IP), listenerName=ListenerName(REPLICATION_SECURE), securityProtocol=SSL, buffer=null) is not authorized"
You receive the preceding error when both the following conditions are true:
- SASL/SCRAM authentication is turned on for your Amazon MSK cluster.
- resourceType is set to CLUSTER and operation is set to CLUSTER_ACTION in the ACLs for your cluster.
The Amazon MSK cluster doesn't support the preceding settings because the settings prevent the internal Apache Kafka replication. The brokers' identities appear as ANONYMOUS for inter-broker communication. If your cluster must support the ACLs and use the SASL/SCRAM authentication, then allow the ANONYMOUS user to use the ALL operation.
Run the following command to grant the ALL operation to the ANONYMOUS user:
./kafka-acls.sh --authorizer-propertieszookeeper.connect=example-ZookeeperConnectString --add --allow-principal User:ANONYMOUS --operation ALL --cluster
Related information
Connect to an Amazon MSK Provisioned cluster
How do I troubleshoot common issues when using my Amazon MSK cluster with SASL/SCRAM authentication?
- Topics
- Analytics
- Language
- English

Looks like "org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: createTopics" is a bit more general than just network connectivity.
After getting errors because I was using a SASL mechanism of "PLAIN" (which proves I had network line of sight), I changed to "SCRAM-SHA-512" and then got the Timeout exception. Turns out I was still using "org.apache.kafka.common.security.plain.PlainLoginModule" left over from a previous cluster connection. Changing to "org.apache.kafka.common.security.scram.ScramLoginModule" fixed it all immediately
Thank you for your comment. We'll review and update the Knowledge Center article as needed.
Relevant content
- asked 2 years ago
- asked 4 months ago
- asked 2 years ago