Skip to content

EKS Pod Fails to Pull Image from Docker Hub After Manual Trigger in Bitbucket Pipeline

0

I have an EKS cluster with an unmanaged Windows node group and a managed Linux node group. My applications run on this cluster, and one of them is deployed on the Windows node group.

For deployment, I use Bitbucket CI/CD pipelines and store my Docker images in Docker Hub private registries. My pipeline includes a manual trigger before the deployment step, which deploys the Docker image to my EKS cluster.

The issue occurs when I push the image to Docker Hub and the pipeline pauses due to the manual trigger. On the deployment day, when I approve the manual trigger, my pod fails to start with the following error in the events: rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/xyz/test-prod:432489c8a-43243.2025": failed to resolve reference.

However, when I rerun the pipeline, the deployment works fine, and the pods reach a running state.

What could be the reason behind this behavior, and how can I prevent it from happening again?

asked a year ago375 views
1 Answer
1

The issue you're experiencing with your EKS pod failing to pull the image from Docker Hub after a manual trigger in your Bitbucket pipeline could be due to a few reasons:

  1. Image Pull Secrets: When there's a delay between pushing the image and deploying it, the image pull secrets in your cluster might become outdated or invalid. Ensure that your image pull secrets are correctly configured and up-to-date in your EKS cluster.

  2. Network Connectivity: The Windows node group might have temporary network connectivity issues when trying to pull the image from Docker Hub. This could be due to transient network problems or firewall rules.

  3. Docker Hub Rate Limits: If you're using a free Docker Hub account, you might be hitting rate limits, especially if there's a significant delay between pushing the image and deploying it.

  4. Image Availability: There might be a delay in the image becoming available on Docker Hub after it's pushed, especially if it's a large image.

To prevent this issue from happening again, you can try the following:

  1. Use ECR instead of Docker Hub: Amazon Elastic Container Registry (ECR) is more tightly integrated with EKS and might provide more reliable image pulls.

  2. Implement image pull retries: Configure your pod specification to retry image pulls a few times before failing.

  3. Verify image pull secrets: Ensure your image pull secrets are correctly configured and not expiring.

  4. Check network connectivity: Ensure your Windows nodes have consistent network access to Docker Hub.

  5. Consider using a paid Docker Hub account: This can help avoid rate limiting issues.

  6. Implement a health check in your pipeline: Before triggering the deployment, add a step to verify that the image is available and can be pulled successfully.

  7. Use image digest instead of tags: This ensures you're pulling the exact image you pushed, even if there's a delay in deployment.

By implementing these measures, you should be able to improve the reliability of your image pulls and reduce the likelihood of deployment failures after manual triggers.
Sources
EKS creating pod stuck at pulling image | AWS re:Post
Tutorial: Getting started with AWS Batch on Amazon EKS Private Clusters - AWS Batch

answered a year ago
EXPERT
reviewed a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.