Lambda Apache Kafka Trigger Connection Errors

0

I am testing Lambda connectivity to a self-managed Apache Kafka cluster and I am seeing inconsistent behavior in the trigger configuration.

Kafka version: 2.13-3.5.0

Setup: Kafka is installed and running in a public subnet in a VPC in the same region and account as the lambda function. I am able to go through the Kafka trigger setup in lambda and the trigger is created. As part of the trigger setup, I select the same vnet, subnet, and security group as that of the cluster. However, after a few minutes I see the below error message:

"Last processing result: PROBLEM: Connection error. Your VPC must be able to connect to Lambda and STS, as well as Secrets Manager if authentication is required. You can provide access by configuring PrivateLink or a NAT Gateway. For how to setup VPC endpoints/NAT gateway, please check https://aws.amazon.com/blogs/compute/setting-up-aws-lambda-with-an-apache-kafka-cluster-within-a-vpc/"

Following the guidance, I added the PrivateLink endpoints to Lambda and STS in the same subnet and the trigger started working.

Question: Since there is an internet gateway provisioned in the same subnet, shouldn't that allow connection to Lambda and STS and Interface endpoints or a NAT gateway would not be required?

Question: If I remove the interface endpoints, the trigger immediately stops picking up new messages from the Kafka topic, but the status of the trigger is not updated. After about 10-20 minutes, the trigger status (Last processing result) will change back to the above error, but it will occasional revert to "OK". After about an hour the trigger state will change to "disabled". Once the trigger is disabled, adding back the interface endpoints will not restore the trigger to a working state and it will need to be deleted and then recreated to start working. EDIT: I found that I can edit the trigger and reselect the activate checkbox and that will reenable the trigger and it will start working again.

Kafka Trigger

profile pictureAWS
asked 9 months ago220 views
2 Answers
0
Accepted Answer

Thanks to Riku's answer, I was able to do some additional research and find the underlying issues.

An internet gateway does not necessarily allow all resources internet access as I had assumed. The resources must have public IPs assigned. This can be done by assigning via elastic IPs, via the subnet settings, or directing traffic through a NAT gateway.

When the Lambda Kafka connector is created and a VPC and subnet are specified in the connector, the service creates a new network interface in the specified subnet: AWS Lambda VPC ENI-SMK-908217237612-d8783fb7-57eb-468e-b3f1-9236e6a3754a-9adc9e84-51a4-494a-8521-173c03449ffc

Even if the subnet is enabled to automatically assign public IPs, it will not assign public IPs to this resource.

Now, the AWS Lambda VPC ENI must be able to connect to the Lambda and STS services. This can be accomplished any of the 3 ways below

  1. Create VPC Interface Endpoints for Lambda and STS. This uses AWS privatelink, so traffic will not be via the public internet
  2. Ensure the ENI was created in the private subnet with a NAT gateway provisioned. The Lambda ENI will then be able to use the NAT gateway as its IP address and access the Lambda and STS services via the internet gateway provisioned in the public subnet. Traffic between the connector and the lambda and STS services will use the public internet
  3. Assign a public IP to the Lambda ENI in the public subnet. If the ENI is in the public subnet, it can't use a NAT gateway and must have a public IP assigned to use the internet gateway. Traffic between the connector and the lambda and STS services will use the public internet

So in the guide linked to the connector (https://aws.amazon.com/blogs/compute/setting-up-aws-lambda-with-an-apache-kafka-cluster-within-a-vpc/), the Kafka cluster in a public subnet section should specify that a public IP will need to be manually assigned to the automatically created lambda ENI to allow connectivity to the required Lambda and STS services

Sources:

profile pictureAWS
answered 9 months ago
profile picture
EXPERT
reviewed a month ago
0

Hello.

Question: Since there is an internet gateway provisioned in the same subnet, shouldn't that allow connection to Lambda and STS and Interface endpoints or a NAT gateway would not be required?

Normally, placing Lambda on a public subnet of a VPC does not give it a public IP address.
Communicating with AWS APIs usually requires a public IP address.
Therefore, NAT Gateway and VPC endpoints must be configured.
This can be seen by checking the Lambda ENI, which shows that no public IP address has been set.
https://repost.aws/knowledge-center/internet-access-lambda-function

If I remove the interface endpoints, the trigger immediately stops picking up new messages from the Kafka topic, but the status of the trigger is not updated. After about 10-20 minutes, the trigger status (Last processing result) will change back to the above error, but it will occasional revert to "OK". After about an hour the trigger state will change to "disabled". Once the trigger is disabled, adding back the interface endpoints will not restore the trigger to a working state and it will need to be deleted and then recreated to start working. EDIT: I found that I can edit the trigger and reselect the activate checkbox and that will reenable the trigger and it will start working again.

As you recognize, this one will take some time, but it will return to normal if the settings are correct.

profile picture
EXPERT
answered 9 months ago
  • Actually, the Lamba is not connected to a VPC. The event source mapping from the Kafka topic to the Lambda and STS is what is requiring either a NAT gateway or interface endpoints. However, this did help show me the correct path, so thank you!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions