AWS ECS Tasks are killed by OOM killer without hitting memory limit

0

We have an ECS service that uses FARGATE as a launch type. One of the tasks of this service is killed due to memory usage. But when I check the container metrics the maximum memory usage did not hit to limit. It is always happening 1.5 hours after the task creation. We have other tasks that they are also with same task definition but they are not killed even if their memory usage is higher. We don't have any metrics other than those provided by cloud watch.

ErrorMessage: 'reason': 'OutOfMemoryError: Container killed due to memory usage'

Can you please guide me to what can be causing this issue and ways to improve this so the process does not shut down?

MemoryUsage/Limit Avarage Memory Usages For Containers

  • Are you using EFS? EFS will consume some memory of the container

  • Nope we are not using EFS. All tasks are stateless.

asked a year ago3021 views
2 Answers
0

I saw this a week ago with a team i consulted with.

It sounds like you are running into an issue with an Amazon Elastic Container Service (ECS) task that is being killed due to a memory usage issue. There could be a few possible reasons for this issue. Some things to consider include:

  • Task definition memory limit: Make sure that the task definition for the ECS task specifies an appropriate memory limit. If the memory limit is set too low, the task may be killed when it reaches the limit.
  • Memory usage of other processes: Make sure that there are no other processes on the host that are using a large amount of memory and potentially causing the ECS task to be killed.
  • Memory leak in the application: If the application that is running in the ECS task is leaking memory, this could cause the task to be killed due to memory usage. Try analyzing the application's memory usage to see if there are any potential issues.
  • Host memory: Make sure that the host running the ECS task has enough memory available. If the host is running low on memory, this could cause the ECS task to be killed.
SeanSi
answered a year ago
  • Hey SeanSi,

    • We have 16GB memory limit for tasks and we are giving %65 of the memory to java heap. In total (heap + non heap) java uses ~14GB of the container limit.
    • We are running only a spring application in the container. There are no other processes, except ecs agent.
    • We have multiple tasks at a time. We are running at least 6 task for a service at a time with same configuration. Only one of them was killed by OOM killer. Other tasks can live until the next deployment. For example 3-4 days. And the java configured for its on OOM kill to give notification to the team. It never dies because of the memory leak in java.
    • Tasks are running on Fargate, it is managed service. We are not managing host instances.
0

After we enabled the java native memory tracking , I figured out there is a bug in java version 11.0.16. We were using openjdk and openjdk deprecated java 11 with this version. The issue solved after we moved to amazon corretto 11.0.20.

answered 6 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions