"Task ran out of memory" from containerd but neither the pod or the container were killed.

0

Hey guys.

So I faced a few cases in the last weeks that the containerd printed "Task ran out of memory" error, but neither the pod nor the container were killed or restarted.

I could see that the Pod memory usage was skyrocketing at that moment, and get stabilized right after the containerd "Task ran out of memory" got printed.

I was expected to see something like: "exit 137" or an increased pod restart counter, but there was nothing.

What is the actual behavior of the containerd when it prints the "Task ran out of memory" error, and what's the condition of the error?

I'm running the EKS v1.27 cluster with amazon-eks-node-1.27-v20230711 AMI (ami-00f80984c1a72a9d1).

Kernel version: 5.10.199-190.747.amzn2.aarch64, Kubelet v1.27.7-eks-e71965b.

Can someone help me here?

Thanks!

Yechan
已提問 3 個月前檢視次數 396 次
2 個答案
0

if a container exceeds its memory limit, it will be terminated and, depending on the restart policy, the pod might be restarted. This is what typically leads to an "exit 137" status, indicating termination due to memory issue. Please check it as below the steps as mention kubectl describe pod <pod-name> Look at kubelet and containerd logs for more detailed information about the memory handling Ensure your EKS cluster, kubelet, and containerd are up-to-date with the latest patches. Use tools like Prometheus or CloudWatch to monitor the memory usage over time.

profile picture
Jagan
已回答 3 個月前
profile picture
專家
已審閱 1 個月前
  • Hey Jagan. Thanks for the reply.

    Unfortunately, I was not able to see any exit 137 or restart histories from the pod.

    The memory usage of the pod definitely increased at that time, and it quickly stabilized when the OOM even occurred.

    I checked the Containerd OOM event and the Pod memory usage through the Datadog btw.

    Do you have any idea where should I look further? I don't think checking the EKS version and etc are helpful tho.

0

Given the specific versions you provided (EKS v1.27, Kernel version 5.10.199), it may be worth checking for any known issues or updates related to memory management in these versions. could ypu please refer to thoiis eks-ami official documentations for known issues in the provided version :- https://github.com/awslabs/amazon-eks-ami/blob/master/CHANGELOG.md

profile picture
專家
已回答 3 個月前
profile picture
專家
已審閱 1 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南