How do I troubleshoot an AWS IoT rule that fails to send messages to an Amazon MSK topic?

5 minute read
0

I want to troubleshoot an AWS IoT Core rule that fails to send messages to an Amazon Managed Streaming for Apache Kafka (Amazon MSK) topic.

Short description

When an AWS IoT rule fails to publish a message to an Amazon MSK cluster, you might receive one of the following error messages:

  • "KafkaAction failed to send a message to the specified bootstrap servers. Topic <topic_name> not present in metadata after 1000 ms."
  • "KafkaAction failed to send a message to the specified bootstrap servers. SSL handshake failed."
  • "KafkaAction failed to send a message to the specified bootstrap servers. An unknown error occurred."
  • "KafkaAction failed to send a message to the specified bootstrap servers. No resolvable bootstrap urls given in bootstrap.servers."

Resolution

Before you begin to troubleshoot, complete the following steps:

  1. Configure AWS IoT logging in the same AWS Region as the AWS IoT rule.
  2. Verify the setup of the AWS IoT rule and the Amazon MSK clusters. For more information, see Step 3. Set up Kafka producer and consumer on AWS Cloud9 to test the setup of Field Notes: Deliver messages using an IoT rule action to Amazon Managed Streaming for Apache Kafka.

Troubleshoot based on the KafkaAction error message that you received

Note: Make sure that you use the correct port numbers to communicate with client machines.

Error: KafkaAction failed to send a message to the specified bootstrap servers. Topic not present in metadata after 1000 ms.

This error occurs when AWS IoT Core can't access the metadata of the topic that's defined on the Amazon MSK cluster. To troubleshoot this error, complete the following steps:

  1. Check if the topic is on the Amazon MSK cluster. 
    Note: Replace example-topic-name with the name of your topic.

    ./bin/kafka-topics.sh —list —zookeeper $ZOOKEEPER_STRING | grep example-topic-name
  2. Check if the correct connection strings for the bootstrap server and ZooKeeper are in the AWS IoT rule configuration. You can find the bootstrap server and ZooKeeper connection strings on the Client information page in the Amazon MSK settings.

  3. Check the security group that's mapped to the cluster. The security group must allow inbound traffic to ports that are mapped for the bootstrap server from the Amazon Virtual Private Cloud (Amazon VPC) destination.

  4. Check if the ports for ZooKeeper allow inbound traffic. ZooKeeper uses Port 2181 for PLAINTEXT and 2182 for TLS encryption.

  5. (Optional) If the Amazon VPC destination and cluster don't share the same Amazon VPC and subnet, then create a NAT gateway in your subnets. This allows you to forward messages from AWS IoT Core to a public Amazon MSK cluster. For more information, see Connecting to an Amazon MSK cluster.

  6. Launch an Amazon Elastic Compute Cloud (Amazon EC2) instance in the same subnet as the Amazon MSK cluster.

  7. Check if the ports are open:
    Note: Replace example-port-number with the port number.

    Bootstrap:

    telnet bootstrap-broker example-port-number

    ZooKeeper:

    telnet Apache-ZooKeeper-node example-port-number
  8. Check if the AWS Identity and Access Management (IAM) role that's attached to the AWS IoT rule has the correct permissions. The IAM role must have the permissions to manage elastic network interfaces in the Amazon VPC:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "ec2:CreateNetworkInterface",
                    "ec2:DescribeNetworkInterfaces",
                    "ec2:CreateNetworkInterfacePermission",
                    "ec2:DeleteNetworkInterface",
                    "ec2:DescribeSubnets",
                    "ec2:DescribeVpcs",
                    "ec2:DescribeVpcAttribute",
                    "ec2:DescribeSecurityGroups"
                ],
                "Resource": "*"
                
             }
        ]
    }
  9. If the Amazon MSK cluster is configured with a username and password, then check if the permissions are in the policy:

    {
                "Effect": "Allow",
                "Action": [
                    "secretsmanager:GetSecretValue",
                    "secretsmanager:DescribeSecret"
                ],
                "Resource": "arn:aws:secretsmanager:region:account-id:"
            }
  10. Check if the trust policy allows AWS IoT Core to assume the role:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Service": "iot.amazonaws.com"
                },
                "Action": "sts:AssumeRole"
            }
        ]
    }
  11. (Optional) If you use a customer managed key to encrypt the data at rest, then make sure that the IAM role has the correct permissions. The IAM role must have permissions to use the AWS Key Management Service (AWS KMS) key for the caller. In the following example IAM policy, the IAM role is granted AWS KMS permissions:
    Note: Replace example-account-id with your account ID and example-iam-role with your IAM role.

    {  
                "Sid": "Enable IAM User Permissions",  
                "Effect": "Allow",  
                "Principal": [  
                    "AWS": {  
                        "arn:aws:iam::example-account-id:example-iam-role",  
                        "arn:aws:iam::example-account-id:root"  
                    }  
                ],  
                "Action": "kms:*",  
                "Resource": "*"  
    }
  12. Check that the IAM role that the AWS IoT rule assumed has permissions to perform AWS KMS actions in the IAM policy.

  13. Check if the relevant partition is on the Amazon MSK cluster. If you have a single partition (0) and the AWS IoT rule tries to access a partition (1), then the error message appears again.

Error: KafkaAction failed to send a message to the specified bootstrap servers. SSL handshake failed.

This error occurs when the Amazon MSK cluster has an issue during the TLS handshake with the Amazon MSK cluster. If you receive this error, then you must use AWS Private Certificate Authority (AWS Private CA) issued certificates. You can add AWS Private CA certificates to a key store and the AWS IoT rules. For more information, see Mutual TLS authentication.

Error: KafkaAction failed to send a message to the specified bootstrap servers. An unknown error occurred.
Error: KafkaAction failed to send a message to the specified bootstrap servers. No resolvable bootstrap urls given in bootstrap.servers.

To troubleshoot these errors, complete the preceding steps 1 through 13. If you still receive the error messages, then gradually increase the message publishes to the AWS IoT topic. If you still experience issues, then contact AWS Support.

Related information

Apache Kafka

How to integrate AWS IoT Core with Amazon MSK

Deliver data at scale to Amazon Managed Streaming for Apache Kafka (Amazon MSK)

AWS OFFICIAL
AWS OFFICIALUpdated 5 months ago