ECS Tasks CPU hard limit without increase of cpu reservation

0

Hi :)

I am currently trying to resolve an issue with our ECS EC2-based cluster. Our Task definitions utilize the container-level soft CPU limit (set to 50 CPU units), but dont utilize the hard CPU limit from task-definition level. In our case we have more than 2000 services running, each one with a single task. These tasks have new revisions being re-deployed at a very high rate and at the same time, and upon startup they often reach levels of CPU usage above 1000%. This causes the whole EC2 instances to become unresponsive, resulting in a need to restart the whole machine. This we have solved with an alarm and a lambda for quickly rebooting failing, unresponsive instances.

But this is not a solution, only a temporary fix. What we would like to achieve is to somehow limit the CPU usage of each task to not exceed the soft limit by factors of 10 or more. I have found a way to do it using the hard CPU limit, but this solution is also not great, mainly for the following reasons:*

  1. Our tasks even with the soft limit use at most 50% of this reservation, but the hard limit's minimum value for ECS with EC2 is 128 units (compared to current 50)
  2. The hard limit automatically increases the reservation value for the chosen task, meaning that setting the limit to 128 for all 2000+ servcies/tasks would require us to host more than double the amount of EC2 machines, without an actual gain, as our cluster's usage right now hangs around 5-10%.

So my question is, is there a way to somehow limit the maximum cpu usage of each task/container without using the task-level hard limit? Our EC2 machines are running on Ubuntu

답변 없음

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠