ECS service connect: nginx error upstream server when use dns name

0

I am deploying Nginx and Node.js services on ECS. Nginx listens for requests from the ALB and forwards them to Node.js. Nginx uses upstream and proxy_pass to forward requests to the Node.js service. An issue occurs with the upstream connection when I update the Nginx or Node.js service. For example:

  1. Update the Nodejs service => upstream error => update the Nginx service => works fine.
  2. Everything fine => update the Nginx service => upstream error => update the Nginx service again => works fine. This is error message of nginx: failed (104: Connection reset by peer) while reading response header from upstream I also tried removing the upstream configuration from the nginx.conf file and directly using the Node.js service connect endpoint along with the resolver from /etc/hosts. However, another issue occurred as follows: failed (99: Cannot assign requested address) while connecting to upstream

I know the service connect feature updates the /etc/hosts file, so 1st case may be normal, but in the 2nd case, I don't understand what happened. If anyone has encountered a similar situation, please share!

1 Answer
0

The issues you're experiencing with ECS Service Connect and Nginx are related to service discovery and DNS resolution during service updates. Let's address each scenario:

  1. Updating the Node.js service causing upstream errors: This behavior is expected. When you update the Node.js service, new tasks are created with potentially different IP addresses. The Nginx configuration, which uses the old IP addresses, becomes outdated. Updating the Nginx service afterwards refreshes its configuration with the new Node.js service endpoints, resolving the issue.

  2. Updating the Nginx service causing upstream errors: This scenario is less straightforward. The error might occur because the Nginx service update temporarily disrupts the service discovery mechanism. When you update Nginx, it may briefly lose its ability to resolve the Node.js service's DNS name. Updating Nginx again allows it to re-establish the connection and resolve the DNS names correctly.

Regarding the error "Cannot assign requested address" when using the Node.js service connect endpoint directly: This error suggests that Nginx is trying to connect to an IP address that is no longer valid. It could be because the DNS resolution is cached, and Nginx is attempting to use an outdated IP address.

To mitigate these issues, consider the following approaches:

  1. Use DNS names instead of IP addresses in your Nginx configuration. This allows Nginx to resolve the current IP addresses of your Node.js services dynamically.

  2. Implement retries and backoff in your Nginx configuration to handle temporary connection issues during service updates.

  3. Configure Nginx to use a shorter DNS cache timeout. This can be done by adding a resolver directive with a valid parameter in your Nginx configuration.

  4. Consider using ECS Service Connect's built-in service discovery features instead of relying on custom Nginx upstream configurations. Service Connect provides automatic service discovery and load balancing between tasks.

  5. Implement health checks in your Nginx configuration to detect and remove unhealthy upstream servers.

By implementing these strategies, you should be able to create a more resilient setup that can handle service updates more gracefully.
Sources
Use Service Connect to connect Amazon ECS services with short names - Amazon Elastic Container Service
Amazon ECS Service Connect components - Amazon Elastic Container Service
Use service discovery to connect Amazon ECS services with DNS names - Amazon Elastic Container Service

profile picture
answered 15 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions