CoreDNS can't resolve on public subnet in VPC with public and private subne

0

I completed the AWS EKS using their setup steps.

AWS EKS ver 1.11, coredns

With the VPC I create two public and two private subnets according to their docs here: https://docs.aws.amazon.com/eks/latest/userguide/create-public-private-vpc.html

Nodes deployed to a private subnet are labeled private and nodes deployed to a public subnet are labeled public.

When I deploy a busybox pod to each nodeSelector (public/private) the public container cannot resolve dns while the private can.

nslookup: can't resolve 'kubernetes.default'

If I ssh onto the public subnet node itself I am able to ping hostnames (ie google.com) successfully.

Any thoughts?

# kubectl exec -it busybox-private -- nslookup kubernetes.default

Server:    172.20.0.10
Address 1: 172.20.0.10 ip-172-20-0-10.ec2.internal

Name:      kubernetes.default
Address 1: 172.20.0.1 ip-172-20-0-1.ec2.internal
# kubectl exec -it busybox-public -- nslookup kubernetes.default
Server:    172.20.0.10
Address 1: 172.20.0.10

nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
# kubectl -n=kube-system get all
NAME                           READY     STATUS    RESTARTS   AGE
pod/aws-node-46626             1/1       Running   0          3h
pod/aws-node-52rqw             1/1       Running   1          3h
pod/aws-node-j7n8l             1/1       Running   0          3h
pod/aws-node-k7kbr             1/1       Running   0          3h
pod/aws-node-tr8x7             1/1       Running   0          3h
pod/coredns-7bcbfc4774-5ssnx   1/1       Running   0          20h
pod/coredns-7bcbfc4774-vxrgs   1/1       Running   0          20h
pod/kube-proxy-2c7gj           1/1       Running   0          3h
pod/kube-proxy-5qr9h           1/1       Running   0          3h
pod/kube-proxy-6r96f           1/1       Running   0          3h
pod/kube-proxy-9tqxt           1/1       Running   0          3h
pod/kube-proxy-bhkzx           1/1       Running   0          3h

NAME               TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)         AGE
service/kube-dns   ClusterIP   172.20.0.10   <none>        53/UDP,53/TCP   20h

NAME                        DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/aws-node     5         5         5         5            5           <none>          20h
daemonset.apps/kube-proxy   5         5         5         5            5           <none>          20h

NAME                      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/coredns   2         2         2            2           20h

NAME                                 DESIRED   CURRENT   READY     AGE
replicaset.apps/coredns-7bcbfc4774   2         2         2         20h

Going through "Debugging DNS Resolution"
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

Odd that AWS has their coredns pods still labelled kube-dns

# kubectl get pods --namespace=kube-system -l k8s-app=kubedns
No resources found.

# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                       READY     STATUS    RESTARTS   AGE
coredns-7bcbfc4774-5ssnx   1/1       Running   0          20h
coredns-7bcbfc4774-vxrgs   1/1       Running   0          20h

# for p in $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name); do kubectl logs --namespace=kube-system $p; done
2019/01/31 15:23:36 [INFO] CoreDNS-1.1.3
2019/01/31 15:23:36 [INFO] linux/amd64, go1.10.5, d47c9319
.:53
CoreDNS-1.1.3
linux/amd64, go1.10.5, d47c9319
2019/01/31 15:23:36 [INFO] CoreDNS-1.1.3
2019/01/31 15:23:36 [INFO] linux/amd64, go1.10.5, d47c9319
.:53
CoreDNS-1.1.3
linux/amd64, go1.10.5, d47c9319
ahec
gefragt vor 5 Jahren803 Aufrufe
4 Antworten
0

The busybox images should be <= 1.28.4

dvohra
beantwortet vor 5 Jahren
0

dvohra wrote:
The busybox images should be <= 1.28.4

Thanks. I was on busybox:1.28.4.

ahec
beantwortet vor 5 Jahren
0

Looking at the worker node security groups is where I think I found the issue.

The AWS EKS kube-dns endpoints and pods were on the private subnet.

I have two CloudFormation stacks....one for autoscaling nodes in the private subnets and one for autoscaling nodes in the public subnets.

They didn't have a common security group so the pods running in the public nodes weren't able to access the kube-dns pods running on the private nodes.

Once I update the worker node security groups to allow cross communication the dns started working.

Pls post if anyone sees any unintended consequences. Thx!

ahec
beantwortet vor 5 Jahren
0

ahec wrote:
Looking at the worker node security groups is where I think I found the issue.

The AWS EKS kube-dns endpoints and pods were on the private subnet.

I have two CloudFormation stacks....one for autoscaling nodes in the private subnets and one for autoscaling nodes in the public subnets.

They didn't have a common security group so the pods running in the public nodes weren't able to access the kube-dns pods running on the private nodes.

Once I update the worker node security groups to allow cross communication the dns started working.

Pls post if anyone sees any unintended consequences. Thx!

Thanks for this. I was having DNS issues with S3 endpoints and I have a similar setup to yours. I have two ASG; one in each AZ per the cluster autoscaler documentation. The CF templates I used were the AWS ones so they did not automatically add the cross-AZ security group rules (the default template adds a self referencing rule to the SG it creates for worker nodes). Adding a rule for all traffic for cross-AZ node communication fixed our DNS issues immediately.

Edited by: rrasco on Aug 27, 2019 1:55 PM

rrasco
beantwortet vor 5 Jahren

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen