Containers stop responding on public IP

0

Hi there!

I've got a few services (Prefect and MLflow) running in ECS. These have public IPs as well as private IPs with an associated service discovery endpoint.

Access using the public IP is shaky (especially for MLflow, but both are affected). After a while, the services become unreachable using the public IP however they still respond to requests using the private IP or service discovery endpoint.

Spawning a new container seems to fix this, and services (usually, not always) become reachable on their new public IPs right away, but they stop working again "after a while". At this point, the services stop responding to pings as well.

I think the containers themselves are probably fine, since they do respond to traffic that reach them. I'm really at a loss here, and would be grateful for any input.

// R

  • Can you elaborate more about the networking configuration? Like, what is the default gateway configured? What about the SG and the NACL rules? What is the error that you are receiving when you are unable to connect to the public IP (please, provide the curl -vI output).

1 Answer
1
Accepted Answer

The issue was that containers were allowed to (re)spawn in any subnet in the VPC (I think it's random?).

Some of these had configurations which were not suitable for our services - traffic could get in, but services were not permitted to respond. Confirmed by spawning a bunch of containers and seeing which ones I can access.

The solution is to recreate the services with more carefully selected subnets.

Richard
answered 19 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions