Containers stop responding on public IP

0

Hi there!

I've got a few services (Prefect and MLflow) running in ECS. These have public IPs as well as private IPs with an associated service discovery endpoint.

Access using the public IP is shaky (especially for MLflow, but both are affected). After a while, the services become unreachable using the public IP however they still respond to requests using the private IP or service discovery endpoint.

Spawning a new container seems to fix this, and services (usually, not always) become reachable on their new public IPs right away, but they stop working again "after a while". At this point, the services stop responding to pings as well.

I think the containers themselves are probably fine, since they do respond to traffic that reach them. I'm really at a loss here, and would be grateful for any input.

// R

  • Can you elaborate more about the networking configuration? Like, what is the default gateway configured? What about the SG and the NACL rules? What is the error that you are receiving when you are unable to connect to the public IP (please, provide the curl -vI output).

1回答
1
承認された回答

The issue was that containers were allowed to (re)spawn in any subnet in the VPC (I think it's random?).

Some of these had configurations which were not suitable for our services - traffic could get in, but services were not permitted to respond. Confirmed by spawning a bunch of containers and seeing which ones I can access.

The solution is to recreate the services with more carefully selected subnets.

Richard
回答済み 1ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン