How do I troubleshoot connectivity issues with Amazon MSK clusters and MSK Connect configurations?
I want to troubleshoot connectivity issues with Amazon Managed Streaming for Apache Kafka (Amazon MSK) and MSK Connect configurations.
Resolution
You might experience connectivity issues caused by network configurations, authentication settings, security group rules, and AWS Identity and Access Management (IAM) permissions.
When you use IAM, SASL, or SSL/TLS authentication, you must turn on encryption between clients and brokers. The cluster must be in the ACTIVE state for you to update the security settings.
Important: For public access, use port 9194 for TLS, port 9196 for SASL/SCRAM, and 9198 for IAM. For more information, see Port information.
Resolve timeout errors when you connect to an MSK cluster
If there are network connectivity issues between your client and the MSK cluster brokers, then you might receive the following error message:
"TimeoutException: Timed out waiting for a node assignment"
To resolve the preceding error, test the network connectivity.
To verify Bootstrap Server Connectivity, run the following telnet or nc command from your client machine:
telnet BOOTSTRAP_SERVER PORT
-or
nc -zvv BOOTSTRAP_SERVER PORT
NOTE: Replace BOOTSTRAP_SERVER with your bootstrap broker endpoint and PORT with an authentication port.
If the test fails, then check that the MSK cluster's security group has an inbound rule that allows traffic from your client's security group or virtual private cloud (VPC) CIDR on the required authentication port. Also, verify that your client's security group has an outbound rule that allows traffic to the MSK cluster's security group on the same port.
Important: Your client can access an MSK Provisioned cluster only if it's in the same Amazon Virtual Private Cloud (Amazon VPC) as the cluster. By default, all communication between your Kafka clients and your MSK Provisioned cluster is private. If you explicitly removed or restricted outbound rules, then traffic is blocked.
Resolve IAM authentication issues
If you try to simultaneously connect multiple IAM authenticated clients to an MSK broker or reconnect clients at a high frequency without proper backoff, then you might receive the following error message:
"SaslAuthenticationException: Access denied or Failed to acquire SASL OAUTHBEARER token"
To resolve the error, increase the reconnect.backoff.ms configuration parameter. For more information, see reconnect.backoff.ms on the Apache Kafka website.
Resolve Amazon MSK Connect connector creation failures
When you try to create MSK Connect and it fails to reach specified MSK broker in the configuration, you might receive the following error message:
"KafkaConnect.BrokerUnreachable or connector stuck in CREATING/FAILED state"
If the connector is in the CREATING state after you deploy the connector, then access the Amazon CloudWatch log group specified in the creation request of your connector. Review the logs for errors. Also, make sure that the IAM roles have the correct permissions attached, and then check the security groups and NACLs for proper connectivity between VPCs cross-account MSK Connect.
To configure network access and necessary IAM permissions, complete the following steps:
-
Review the security groups to make sure that you configured both the inbound and outbound traffic rules.
-
Configure your service execution role so that MSK Connect can assume it.
Example trust policy:{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "kafkaconnect.amazonaws.com" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "aws:SourceAccount": "account-id" } } } ] }Note: Replace accout-id with your AWS account Id.
-
If you receive authorization errors, then add the following IAM PassRole permission to your trust policy:
{ "Effect": "Allow", "Action": ["iam:GetRole", "iam:PassRole"], "Resource": "arn:aws:iam::account-id:role/your-msk-connect-role" }Note: Replace account-id with your AWS account ID. Replace your-msk-connect-role with the name of the IAM role that you created in the account that holds the source or sink MSK cluster.
Resolve Debezium connector task errors
Debezium connector for MySQL only supports a single task because MySQL BinLog requires sequential processing. If parallel task execution causes data inconsistency, then you receive the following error message:
"IllegalArgumentException: Only a single connector task may be started"
To resolve, use the provisioned capacity mode, and then set workerCount to 1 and tasks.max to 1. The connector configuration must include IAM authentication settings for schema history.
Example configuration:
{ "connector.class": "io.debezium.connector.mysql.MySqlConnector", "tasks.max": "1", "database.hostname": "your-rds-endpoint", "database.port": "3306", "database.user": "debezium", "database.password": "***", "database.include.list": "your-database", "topic.prefix": "mskconnect", "schema.history.internal.kafka.bootstrap.servers": "broker-endpoints:9098", "schema.history.internal.kafka.topic": "schema-history.internal", "schema.history.internal.consumer.security.protocol": "SASL_SSL", "schema.history.internal.consumer.sasl.mechanism": "AWS_MSK_IAM", "schema.history.internal.producer.security.protocol": "SASL_SSL", "schema.history.internal.producer.sasl.mechanism": "AWS_MSK_IAM" }
Note: Replace your-rds-endpoint with your Amazon Relational Database Service (Amazon RDS) for MySQL endpoint. Replace *** with your database password for the debezium user. Replace your-database with the name of your database. Replace broker-endpoints:9098 with your MSK cluster endpoints.
For more information, see Debezium connector for MySQL on the Debezium website.
Resolve cross-account connectivity issues
To resolve cross-account connectivity issues, use multi-VPC private connectivity.
You can use multi-VPC private connectivity to connect Kafka clients hosted in different VPCs and accounts to an Amazon MSK cluster. Before you set up a multi-vpc private connectivity, review the requirements and limitations.
Note: When you're using the SASL/SCRAM or mTLS access-control methods, you must set Apache Kafka ACLs for your cluster. You must also update the cluster's configuration to set the allow.everyone.if.no.acl.found property to false. For more information, see Cluster auth type and topic access permissions.
Resolve VPC endpoint DNS conflicts
If you create a VPC endpoint and turn on private DNS name, a private hosted zone for the service is automatically created. Then, your DNS queries look up domain names that end with kafka.<region>.amazonaws.com in the private hosted zone and result in NXDOMAIN responses because broker endpoint records don't exist in the zone. Your application can then fail DNS resolutions for bootstrap broker endpoints after you create a VPC endpoint.
To resolve this issue, first turn off the private DNS option on the VPC endpoint to restore proper DNS resolution. Then, set up Amazon Route 53 hosted zone with CNAME entries that map MSK broker fully qualified domain names (FQDN) to VPC endpoint DNS names for controlled DNS resolution.
Review the following best practices:
- Plan VPC endpoint creation carefully to avoid DNS conflicts with existing MSK clusters.
- Document DNS resolution requirements before you implement VPC endpoints.
- Use Route 53 private hosted zones to manage MSK endpoint resolution in multi-endpoint scenarios.
Resolve CloudShell connection timeouts
AWS CloudShell doesn't automatically establish connections to MSK clusters. You must configure a VPC environment. Even if you have a successful Kafka command line interface (CLI) installation, you might continue to receive timeout errors from CloudShell.
To resolve this issue, complete the following steps:
- Configure CloudShell's VPC environment to connect to the MSK cluster's VPC.
- Verify that the security group allows traffic on port 9098 for IAM authentication.
- Use Amazon Elastic Compute Cloud (Amazon EC2) instances for persistent connections.
Note: CloudShell removes the VPC environment after 30 minutes of inactivity.
Resolve MSK Connect internet access for external services issues
MSK Connect workers that run in public subnets instead of private subnets can’t access external internet resources properly. If MSK Connect tries to connect to external services such as snowflake or external databases, then you might receive timeout errors.
To resolve this issue, make sure that the MSK Connect workers have the correct network connectivity. To access public internet resources, use a NAT Gateway. To access AWS services privately, use VPC endpoints. Configure route tables to route internet traffic (0.0.0.0/0) through the NAT Gateway.
Note: If you use VPC peering or AWS Transit Gateway when you connect to MSK Connect, then don’t configure your connector to reach the peered VPC resources with IP addresses in the following CIDR ranges: 10.99.0.0/16, 192.168.0.0/16, 172.21.0.0/16 cross-account MSK Connect.
Related information
Troubleshoot your Amazon MSK cluster
- Topics
- Analytics
- Language
- English

Relevant content
- asked a year ago
- asked 3 years ago
AWS OFFICIALUpdated 3 months ago