Containers stop responding on public IP

0

Hi there!

I've got a few services (Prefect and MLflow) running in ECS. These have public IPs as well as private IPs with an associated service discovery endpoint.

Access using the public IP is shaky (especially for MLflow, but both are affected). After a while, the services become unreachable using the public IP however they still respond to requests using the private IP or service discovery endpoint.

Spawning a new container seems to fix this, and services (usually, not always) become reachable on their new public IPs right away, but they stop working again "after a while". At this point, the services stop responding to pings as well.

I think the containers themselves are probably fine, since they do respond to traffic that reach them. I'm really at a loss here, and would be grateful for any input.

// R

  • Can you elaborate more about the networking configuration? Like, what is the default gateway configured? What about the SG and the NACL rules? What is the error that you are receiving when you are unable to connect to the public IP (please, provide the curl -vI output).

Richard
已提問 1 個月前檢視次數 221 次
1 個回答
1
已接受的答案

The issue was that containers were allowed to (re)spawn in any subnet in the VPC (I think it's random?).

Some of these had configurations which were not suitable for our services - traffic could get in, but services were not permitted to respond. Confirmed by spawning a bunch of containers and seeing which ones I can access.

The solution is to recreate the services with more carefully selected subnets.

Richard
已回答 1 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南