CoreDNS can't resolve on public subnet in VPC with public and private subne

0

I completed the AWS EKS using their setup steps.

AWS EKS ver 1.11, coredns

With the VPC I create two public and two private subnets according to their docs here: https://docs.aws.amazon.com/eks/latest/userguide/create-public-private-vpc.html

Nodes deployed to a private subnet are labeled private and nodes deployed to a public subnet are labeled public.

When I deploy a busybox pod to each nodeSelector (public/private) the public container cannot resolve dns while the private can.

nslookup: can't resolve 'kubernetes.default'

If I ssh onto the public subnet node itself I am able to ping hostnames (ie google.com) successfully.

Any thoughts?

# kubectl exec -it busybox-private -- nslookup kubernetes.default

Server:    172.20.0.10
Address 1: 172.20.0.10 ip-172-20-0-10.ec2.internal

Name:      kubernetes.default
Address 1: 172.20.0.1 ip-172-20-0-1.ec2.internal
# kubectl exec -it busybox-public -- nslookup kubernetes.default
Server:    172.20.0.10
Address 1: 172.20.0.10

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
# kubectl -n=kube-system get all
NAME                           READY     STATUS    RESTARTS   AGE
pod/aws-node-46626             1/1       Running   0          3h
pod/aws-node-52rqw             1/1       Running   1          3h
pod/aws-node-j7n8l             1/1       Running   0          3h
pod/aws-node-k7kbr             1/1       Running   0          3h
pod/aws-node-tr8x7             1/1       Running   0          3h
pod/coredns-7bcbfc4774-5ssnx   1/1       Running   0          20h
pod/coredns-7bcbfc4774-vxrgs   1/1       Running   0          20h
pod/kube-proxy-2c7gj           1/1       Running   0          3h
pod/kube-proxy-5qr9h           1/1       Running   0          3h
pod/kube-proxy-6r96f           1/1       Running   0          3h
pod/kube-proxy-9tqxt           1/1       Running   0          3h
pod/kube-proxy-bhkzx           1/1       Running   0          3h

NAME               TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)         AGE
service/kube-dns   ClusterIP   172.20.0.10   <none>        53/UDP,53/TCP   20h

NAME                        DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/aws-node     5         5         5         5            5           <none>          20h
daemonset.apps/kube-proxy   5         5         5         5            5           <none>          20h

NAME                      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/coredns   2         2         2            2           20h

NAME                                 DESIRED   CURRENT   READY     AGE
replicaset.apps/coredns-7bcbfc4774   2         2         2         20h

Going through "Debugging DNS Resolution"
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

Odd that AWS has their coredns pods still labelled kube-dns

# kubectl get pods --namespace=kube-system -l k8s-app=kubedns
No resources found.

# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                       READY     STATUS    RESTARTS   AGE
coredns-7bcbfc4774-5ssnx   1/1       Running   0          20h
coredns-7bcbfc4774-vxrgs   1/1       Running   0          20h

# for p in $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name); do kubectl logs --namespace=kube-system $p; done
2019/01/31 15:23:36 [INFO] CoreDNS-1.1.3
2019/01/31 15:23:36 [INFO] linux/amd64, go1.10.5, d47c9319
.:53
CoreDNS-1.1.3
linux/amd64, go1.10.5, d47c9319
2019/01/31 15:23:36 [INFO] CoreDNS-1.1.3
2019/01/31 15:23:36 [INFO] linux/amd64, go1.10.5, d47c9319
.:53
CoreDNS-1.1.3
linux/amd64, go1.10.5, d47c9319
ahec
asked 5 years ago790 views
4 Answers
0

The busybox images should be <= 1.28.4

dvohra
answered 5 years ago
0

dvohra wrote:
The busybox images should be <= 1.28.4

Thanks. I was on busybox:1.28.4.

ahec
answered 5 years ago
0

Looking at the worker node security groups is where I think I found the issue.

The AWS EKS kube-dns endpoints and pods were on the private subnet.

I have two CloudFormation stacks....one for autoscaling nodes in the private subnets and one for autoscaling nodes in the public subnets.

They didn't have a common security group so the pods running in the public nodes weren't able to access the kube-dns pods running on the private nodes.

Once I update the worker node security groups to allow cross communication the dns started working.

Pls post if anyone sees any unintended consequences. Thx!

ahec
answered 5 years ago
0

ahec wrote:
Looking at the worker node security groups is where I think I found the issue.

The AWS EKS kube-dns endpoints and pods were on the private subnet.

I have two CloudFormation stacks....one for autoscaling nodes in the private subnets and one for autoscaling nodes in the public subnets.

They didn't have a common security group so the pods running in the public nodes weren't able to access the kube-dns pods running on the private nodes.

Once I update the worker node security groups to allow cross communication the dns started working.

Pls post if anyone sees any unintended consequences. Thx!

Thanks for this. I was having DNS issues with S3 endpoints and I have a similar setup to yours. I have two ASG; one in each AZ per the cluster autoscaler documentation. The CF templates I used were the AWS ones so they did not automatically add the cross-AZ security group rules (the default template adds a self referencing rule to the SG it creates for worker nodes). Adding a rule for all traffic for cross-AZ node communication fixed our DNS issues immediately.

Edited by: rrasco on Aug 27, 2019 1:55 PM

rrasco
answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions