Heath check failing only after enabling VPC connector

0

Hello, I have 2 App Runner services. Both serving HTTP requests on port 3000. Both passing TCP health checks, both working fine 100% great I'm very happy.

However, I decided to configure a VPC connector for egress so my container can have access to VPC resources. I created a VPC connector with all available subnets in my region (three total), and using the default VPC security group (and I'm using the default VPC). I updated both my App Runner apps to use the VPC connector - one app works completely fine, and the other does not pass health checks. I've tried tearing down everything and trying again, but still, one service passes health checks, and the other fails. It's always the same image that fails and the same image that passes. I don't know what could be causing this since they both work fine without the VPC connector configured.

Attempting to create the problematic service without VPC connector (working):

10-17-2024 09:58:53 PM [AppRunner] Deployment with ID : <id> started. Triggering event : SERVICE_CREATE
10-17-2024 09:58:53 PM [AppRunner] Deployment Artifact: [Repo Type: ECR], [Image URL: <account_id>.dkr.ecr.eu-west-2.amazonaws.com/<project-x>], [Image Tag: <tag>]
10-17-2024 09:59:10 PM [AppRunner] Pulling image <account_id>.dkr.ecr.eu-west-2.amazonaws.com/<project-x> from ECR repository.
10-17-2024 09:59:11 PM [AppRunner] Successfully pulled your application image from ECR.
10-17-2024 09:59:22 PM [AppRunner] Provisioning instances and deploying image for publicly accessible service.
10-17-2024 09:59:32 PM [AppRunner] Performing health check on protocol `TCP` [Port: '3000'].
10-17-2024 10:00:14 PM [AppRunner] Health check is successful. Routing traffic to application.
10-17-2024 10:01:03 PM [AppRunner] Successfully routed incoming traffic to application.
10-17-2024 10:01:03 PM [AppRunner] Successfully routed incoming traffic to application.
10-17-2024 10:01:04 PM [AppRunner] Deployment with ID : <id> completed successfully.
10-17-2024 10:01:04 PM [AppRunner] Deployment with ID : <id> completed successfully.

Attempting to create the problematic service with identical configuration + a VPC connector (not working):

10-17-2024 09:49:22 PM [AppRunner] Deployment with ID : <id> started. Triggering event : SERVICE_CREATE
10-17-2024 09:49:22 PM [AppRunner] Deployment Artifact: [Repo Type: ECR], [Image URL: <account_id>.dkr.ecr.eu-west-2.amazonaws.com/<project-x>], [Image Tag: <tag>]
10-17-2024 09:49:23 PM [AppRunner] Enabling VPC connection for outgoing traffic.
10-17-2024 09:49:23 PM [AppRunner] Successfully enabled VPC connection for outgoing traffic.
10-17-2024 09:49:39 PM [AppRunner] Pulling image <account_id>.dkr.ecr.eu-west-2.amazonaws.com/<project-x> from ECR repository.
10-17-2024 09:49:40 PM [AppRunner] Successfully pulled your application image from ECR.
10-17-2024 09:49:51 PM [AppRunner] Provisioning instances and deploying image for publicly accessible service.
10-17-2024 09:50:02 PM [AppRunner] Performing health check on protocol `TCP` [Port: '3000'].
10-17-2024 09:56:12 PM [AppRunner] Health check failed on protocol `TCP` [Port: '3000']. Check your configured port number. For more information, see the application logs.
10-17-2024 09:58:10 PM [AppRunner] Deployment with ID : a5939dd1c08a469a84a2e50e7aa5475c failed. Failure reason : Health check failed.

No application logs are published to CloudWatch.

Any thoughts/assistance would be appreciated! Thank you!

4 Answers
0

Double check the default security group. This is usually very restrictive for inbound connectivity. Use your own SG as best practice is not to use the Default VPC and any default Security groups

profile picture
EXPERT
answered 7 months ago
  • Hey thanks for your answer. The security group is currently the same for ingress and egress - 0.0.0.0/0 and ::/0 - all protocols/ports.

0

Based on the information provided, it appears that the issue is related to the VPC connector configuration for one of your App Runner services. Here are some potential reasons and suggestions to troubleshoot the problem:

  1. Security Group Configuration: Even though you're using the default VPC security group, it's possible that the security group rules are not allowing the necessary inbound traffic for the health check. Ensure that the security group associated with your VPC connector allows inbound traffic on port 3000 (the port your application is listening on).

  2. Network ACLs: Check if there are any Network ACLs in your VPC that might be blocking the health check traffic to your application.

  3. Application Behavior: Since one service works and the other doesn't when using the VPC connector, it's possible that the problematic service is behaving differently when it detects it's running in a VPC environment. Review your application code to ensure it's not altering its behavior based on the network environment.

  4. Subnet Configuration: Verify that the subnets you've selected for your VPC connector have the necessary route table entries to allow traffic to and from your application.

  5. DNS Resolution: Ensure that DNS resolution and DNS hostnames are enabled for your VPC, as App Runner may rely on DNS for health checks.

  6. Application Startup Time: If your application takes longer to start up when running in the VPC environment, it might not be ready when the health check is performed. You could try increasing the health check interval or timeout in your App Runner service configuration.

  7. VPC Endpoint: If your application depends on other AWS services, ensure you have the necessary VPC endpoints set up for those services.

  8. Logs: Although you mentioned that no application logs are published to CloudWatch, double-check your CloudWatch settings to ensure logging is properly configured. Application logs could provide valuable insights into why the health check is failing.

To further troubleshoot, you could try:

  • Temporarily changing the health check from TCP to HTTP if your application supports it, which might provide more detailed error information.
  • Deploying a simple test application that just responds to health checks to isolate whether the issue is with your application or the VPC configuration.
  • Comparing the detailed configurations of both services to identify any subtle differences that might be causing the issue.

If none of these suggestions resolve the issue, you may need to contact AWS support for more in-depth troubleshooting of your specific configuration.
Sources
Enabling Private endpoint for incoming traffic - AWS App Runner
Using App Runner with VPC endpoints - AWS App Runner
Terminology - AWS App Runner

profile picture
answered 7 months ago
profile picture
EXPERT
reviewed 7 months ago
0

Hey thanks for your answer. The security group is currently the same for ingress and egress - 0.0.0.0/0 and ::/0 - all protocols/ports. I also tried changing the health check to HTTP and had the same results (one fails, one passes). DNS resolution and hostnames are enabled for the VPC. Honestly I struggle seeing it being a VPC issue, since there is one App Runner service that works completely fine using the same VPC connector. This fact does eliminate a lot of possibilities.

answered 7 months ago
0

Ok so after posting this I think I managed to solve it (somewhat) - but I think there may be some room for improvement with the error messages provided by AWS.

I created a brand new VPC to try creating the App Runner app with a connection to a completely new VPC instead of the default one. I'm using Terraform for pretty much everything, and so previously, I created the VPC connector using Terraform for my default VPC. This time however, while I created the VPC using Terraform, I used the dashboard to create the App Runner app along with the VPC connector. I created this new VPC with public subnets only, and when creating the app, AppRunner would not let me choose the public subnets, only private subnets (of which there were none).

This immediately seemed strange to me, because the VPC connector I created via Terraform for the default VPC was using public subnets (my default VPC has only public subnets). I went to check again - yes, my original VPC connector is definitely using public subnets. I've now searched the documentation for information relating to subnets and I found this:

When selecting a subnet for your VPC, ensure that you choose a private subnet, not a public subnet.

It appears that the AWS Dashboard prevents you from accidentally selecting a public subnet, but the AWS API does not.

I created private subnets and a NAT gateway for my new VPC using Terraform, and when I went back to App Runner it allowed me to create the VPC connector with the private subnets. Now, when I create the App Runner service, it starts successfully.

So it seems that the answer here is that the VPC connector needs to be set up in private subnets only. Is there a use case for using public subnets? If not, I think the AWS API should not allow selecting public subnets. Another thing I don't quite understand is why did one service consistently work and the other one consistently didn't? This is what really threw me off the mark debugging this issue.

answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions