How do I troubleshoot Container Insights issues for my Amazon EKS clusters?
I encounter issues when I configure Amazon CloudWatch Container Insights for my Amazon Elastic Kubernetes Service (Amazon EKS) clusters.
Resolution
Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshooting errors for the AWS CLI. Also, make sure that you're using the most recent AWS CLI version.
Check your Container Insights installation
To check whether you correctly installed Container Insights on your Amazon EKS cluster, run the following command:
kubectl get pods -n amazon-cloudwatch
Then, run the following command for your pod:
kubectl describe pod pod-name -n amazon-cloudwatch
Note: Replace pod-name with the pod name.
Check the Events section of the command's output.
To check your CloudWatch logs, run the following command:
kubectl logs pod-name -n amazon-cloudwatch
Install CloudWatch Observability as an Amazon EKS managed add-on
Use the Amazon EKS add-on to install Container Insights with enhanced observability for Amazon EKS.
Note: You can use the CloudWatch Observability EKS add-on on Amazon EKS clusters that run only Kubernetes version 1.23 or later.
To install CloudWatch Observability as a self-managed add-on, complete the following steps:
-
To install cert-manager, run the following command:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.16.1/cert-manager.yaml
-
To install the custom resource definitions (CRD), run the following command:
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/main/k8s-quickstart/cwagent-custom-resource-definitions.yaml | kubectl apply --server-side -f -
-
To install the CloudWatch container agent operator, run the following command:
ClusterName=my-cluster-name RegionName=my-cluster-region curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/main/k8s-quickstart/cwagent-operator-rendered.yaml | sed 's/{{cluster_name}}/'${ClusterName}'/g;s/{{region_name}}/'${RegionName}'/g' | kubectl apply -f -
Troubleshoot metrics that don't appear on the AWS Management Console
If you don't see Container Insights metrics on the AWS Management Console, then confirm that you completed the Container Insights setup.
Troubleshoot Container Insights errors
Unauthorized panic: Cannot retrieve cadvisor data from kubelet
To resolve this issue, make sure to activate the Webhook authorization mode in your kubelet.
Invalid endpoint error
Example error message:
"log": "2020-04-02T08:36:16Z E! cloudwatchlogs: code: InvalidEndpointURL, message: invalid endpoint uri, original error: &url.Error{Op:\"parse\", URL:\"https://logs.{{region_name}}.amazonaws.com/\", Err:\"{\"}, &awserr.baseError{code:\"InvalidEndpointURL\", message:\"invalid endpoint uri\", errs:[]error{(*url.Error)(0xc0008723c0)}}\n",
To resolve this issue, make sure that you replace all placeholder values in your commands. For example, confirm that the information that you use for cluster-name and region-name are correct for your deployment when you run the AWS CLI.
Pod metrics missing on Amazon EKS or Kubernetes after cluster upgrade
Example error message:
"W! No pod metric collected"
If your pod metrics are missing after you upgrade your cluster, then check that the container runtime on the node is working as expected.
To resolve this issue, update your deployment manifest to mount the containerd socket from the host into the container.
Example deployment manifest:
apiVersion: apps/v1 kind: DaemonSet metadata: name: cloudwatch-agent namespace: amazon-cloudwatch spec: template: spec: containers: - name: cloudwatch-agent # ... # Don't change the mountPath volumeMounts: # ... - name: dockersock mountPath: /var/run/docker.sock readOnly: true - name: varlibdocker mountPath: /var/lib/docker readOnly: true - name: containerdsock # NEW mount mountPath: /run/containerd/containerd.sock readOnly: true # ... volumes: # ... - name: dockersock hostPath: path: /var/run/docker.sock - name: varlibdocker hostPath: path: /var/lib/docker - name: containerdsock # NEW volume hostPath: path: /run/containerd/containerd.sock
For a full example of the manifest, see cwagent-daemonset.yaml on the GitHub website.
No pod metrics when using Bottlerocket for Amazon EKS
Example error message:
"W! No pod metric collected"
Bottlerocket uses a different containerd path on the host. If you use Bottlerocket, then you must change all volumes to the Bottlerocket container path location.
Example command:
volumes: # ... - name: containerdsock hostPath: # path: /run/containerd/containerd.sock # bottlerocket does not mount containerd sock at normal place # https://github.com/bottlerocket-os/bottlerocket/commit/91810c85b83ff4c3660b496e243ef8b55df0973b path: /run/dockershim.sock
Unexpected log volume increase from CloudWatch agent when collecting Prometheus metrics
To resolve this issue, update the CloudWatch agent to the latest available version. To find your current version, see Finding information about CloudWatch agent versions. To install the latest version, see Install the CloudWatch agent.
CrashLoopBackoff error on the CloudWatch agent
To resolve this issue, make sure that you correctly configured your AWS Identity and Access Management (IAM) permissions.
CloudWatch agent or Fluentd pod stuck in pending
You pod might be stuck in the Pending state. Or, you receive a FailedScheduling error from your CloudWatch agent or Fluentd pods. To resolve this issue, confirm that your nodes have enough compute resources based on the code quantity and RAM that the agents require.
To describe the pods, run the following command:
kubectl describe pod cloudwatch-agent-85ppg -n amazon-cloudwatch
Configmap for fluent bit not deployed correctly
To resolve this issue, confirm that you correctly deployed the fluent-bit-config config map in the amazon-cloudwatch namespace.
Example error messages:
[2024/10/02 11:16:42] [error] [config] inconsistent use of tab and space [2024/10/02 11:16:42] [error] [config] error in /fluent-bit/etc/..2024_10_02_11_16_29.3759745087//application-log.conf:62: invalid indentation level [2024/10/02 11:16:42] [error] configuration file contains errors, aborting.cwagent-daemonset.yaml

Relevant content
- asked 2 years agolg...
- asked 3 years agolg...
- asked 2 years agolg...
- asked 3 months agolg...
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 15 days ago
- AWS OFFICIALUpdated 9 months ago