- Newest
- Most votes
- Most comments
Hlo,
Please follow below Documents,
[https://repost.aws/knowledge-center/eks-failed-create-pod-sandbox]
Hello,
To address the issue
Check Image Pull Secrets:
- Ensure you have configured imagePullSecrets correctly in your pod specification to authenticate with the private ghcr.io registry.
apiVersion: v1
kind: Pod
metadata:
name: your-pod-name
spec:
containers:
- name: your-container-name
image: ghcr.io/aura-nw/long-campaign-be:euphoria_6cf3d53
imagePullSecrets:
- name: your-secret-name
Create a secret with your GitHub Container Registry credentials:
kubectl create secret docker-registry your-secret-name \
--docker-server=ghcr.io \
--docker-username=your-username \
--docker-password=your-personal-access-token
Check Node Resources:
- Ensure your worker nodes have sufficient resources (CPU, memory, network bandwidth) to pull the images efficiently. Insufficient resources can cause delays in pulling large images.
- Monitor node status and resource usage:
kubectl describe nodes
- Ensure VPC, subnets, and security groups are properly configured for internet access and efficient image pulling.
These steps should help in resolving the issue with image pulling in your EKS cluster.
Node Resources seem normal, can pull normally when exec and pull inside a test pod but image pulling for creating container is very slow without any error
Hello,
When pulling images from a private registry, use imagePullSecrets in the workload manifest to specify the credentials. These credentials authenticate with the private registry, allowing the pod to pull images from the specified private repository.
in additional: I have vpce for S3 and no vpce for api and dkr ecr. ECR in the same Region. EKS EC2 workers have access to Internet via NATGW.
I am experiencing a similar problem so far (presumably) after the update from eks 1.30 and AL2 to eks 1.32 and latest AL2023 eks optimized.
Details:
The problem occurs sometimes - when I update my application, which consists of dozens of microservices - 1 or 2 pods stuck in status Pending (pulling image from ECR cache for docker hub) on one of the nodes, while on other nodes it successfully pull an image from ECR cache and starts. If I forcibly kill the pod, it downloads the image in a few seconds and starts successfully. Please pay serious attention to this issue. This is not related to the update periods of the 12h temporary token, because it appears much more often - when I update my application several times an hour - I always come across the fact that random pods on random nodes hang with this problem.
Relevant content
- asked 2 years ago
- asked 4 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago

thank you but nothing helpful from the docs, don't have any error, just pulling process takes very long time but still succeed at the end