How do I troubleshoot OutOfMemory errors in Amazon ECS?

5 minute read
0

I want to troubleshoot memory usage issues in my Amazon Elastic Container Service (Amazon ECS) task. Or, the containers in my Amazon ECS task are exiting due to an “OutOfMemory” error.

Short description

By default, a container has no resource constraints and can use as many resources as the host's kernel scheduler allows. With Docker, you can control the amount of memory that a container uses. Be sure not to allow a running container to consume most of the host machine's memory. On Linux hosts, when the kernel detects that there isn't enough memory to perform important system functions, it throws an OutOfMemory exception. Then, it ends the processes to free up memory.

With Docker, you can use either of the following memory limits:

  • Hard memory limits that allow the container to use no more than a certain amount of user or system memory
  • Soft limits that allow the container to use the necessary memory unless certain conditions, such as low memory or contention on the host machine, occur

When an Amazon ECS task ends because of OutOfMemory issues, you might receive the following error message in the Amazon ECS console. To view the message, choose the Task ID, and then refer to the Details section to view the container's details:

OutOfMemoryError: Container killed due to memory usage

In this case, a container in your task exits because the container's processes consume more memory than the amount that's allocated in the task definition.

Resolution

To troubleshoot OutOfMemory errors in your Amazon ECS task, complete the following steps:

stats max(MemoryUtilized) as mem, max(MemoryReserved ) as memreserved by bin (5m) as period, TaskId, ContainerName| sort period desc | filter ContainerName like "example-container-name" | filter TaskId = "example-task-id"

To mitigate the risk of task instability because of OutOfMemory issues, complete the following steps:

  • Perform tests to understand the memory requirements of your application before placing the application in production. You can perform a load test on the container within a host or server. Then, check the memory usage of the containers using docker stats (from the Docker Docs website).
  • Be sure that your application runs only on hosts with adequate resources.
  • Limit the amount of memory that your container can use. Set appropriate values for hard limits and soft limits for your containers. Amazon ECS uses several parameters to allocate memory to tasks: memoryReservation for soft limits and memory for hard limits. When you specify these values, they're subtracted from the available memory resources for the container instance where the container is. Note: The parameter memoryReservation isn't supported for Windows containers.
  • You can turn on swap for containers with high transient memory demands. Doing so reduces the chance of OutOfMemory errors when the container is under high load. Note: If you're using tasks that use the AWS Fargate launch type, then parameters maxSwap and sharedMemorySize aren't supported. Important: Be aware of when you configure swap on your Docker hosts. Turning on swap might slow down your application and reduce the performance. However, this feature prevents your application from running out of system memory.

To detect Amazon ECS tasks that ended because of OutOfMemory events, use the following AWS CloudFormation template. With this template, you can create an Amazon EventBridge rule, Amazon Simple Notification Service (Amazon SNS) topic, and an Amazon SNS topic policy. When you run the template, the template asks for an email list, topic name, and a flag to turn monitoring on or off:

AWSTemplateFormatVersion: 2010-09-09
Description: >
        - Monitor OOM Stopped Tasks with EventBridge rules with AWS CloudFormation.

Parameters:
  EmailList:
    Type: String
    Description: "Email to notify!"
    AllowedPattern: '[a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-zA-Z]+'
    Default: "mail@example.com"

  SNSTopicName:
    Type: String
    Description: "Name for the notification topic."
    AllowedPattern: '[a-zA-Z0-9_-]+'
    Default: "oom-monitoring-topic"

  MonitorStatus:
    Type: String
    Description: "Enable / Disable monitor."
    AllowedValues:
      - ENABLED
      - DISABLED
    Default: ENABLED

Resources:
  SNSMonitoringTopic:
    Type: AWS::SNS::Topic
    Properties:
      Subscription:
        - Endpoint: !Ref EmailList
          Protocol: email
      TopicName: !Sub ${AWS::StackName}-${SNSTopicName}
      
  SNSMonitoringTopicTopicPolicy:
    Type: AWS::SNS::TopicPolicy
    Properties:
      Topics:
        - !Ref SNSMonitoringTopic
      PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Sid: SnsOOMTopicPolicy
            Effect: Allow
            Principal:
              Service: events.amazonaws.com
            Action: [  'sns:Publish' ]
            Resource: !Ref SNSMonitoringTopic
          - Sid: AllowAccessToTopicOwner
            Effect: Allow
            Principal:
              AWS: '*'
            Action: [  'sns:GetTopicAttributes',
                       'sns:SetTopicAttributes',
                       'sns:AddPermission',
                       'sns:RemovePermission',
                       'sns:DeleteTopic',
                       'sns:Subscribe',
                       'sns:ListSubscriptionsByTopic',
                       'sns:Publish',
                       'sns:Receive' ]
            Resource: !Ref SNSMonitoringTopic
            Condition:
              StringEquals:
                'AWS:SourceOwner': !Ref 'AWS::AccountId'
          
  EventRule:
    Type: AWS::Events::Rule
    Properties:
      Name: ECSStoppedTasksEvent
      Description: Triggered when an Amazon ECS Task is stopped
      EventPattern:
        source:
          - aws.ecs
        detail-type:
          - ECS Task State Change
        detail:
          desiredStatus:
            - STOPPED
          lastStatus:
            - STOPPED
          containers:
            reason:
              - prefix: "OutOfMemory"
      State: !Ref MonitorStatus
      Targets:
        - Arn: !Ref SNSMonitoringTopic
          Id: ECSOOMStoppedTasks
          InputTransformer:
            InputPathsMap:
              taskArn: $.detail.taskArn
            InputTemplate: >
                "Task '<taskArn>' was stopped due to OutOfMemory."

After you create the CloudFormation stack, verify your email to confirm the subscription. After a task ends because of an OutOfMemory issue, you get an email with a message similar to the following example:

"Task 'arn:aws:ecs:eu-west-1:555555555555:task/ECSFargate/0123456789abcdef0123456789abcdef' was stopped due to OutOfMemory."

Related information

How do I troubleshoot Amazon ECS tasks stopping or failing to start while my container exits?

AWS OFFICIAL
AWS OFFICIALUpdated 8 months ago
2 Comments

The article says that I might receive an error like OutOfMemoryError: Container killed due to memory usage but the word receive is ambiguous and misleading: there's nothing that sends anything to anyone; likely I myself must find a way to find this text and it would be useful if this post would describe where to search, because the AWS console is such a mess.

replied 9 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 9 months ago