I have multiple EKS clusters running now for a while and noticed that for one of them (sometimes) the kubectl commands are surprisingly slow. They are all in the same region (the one closest to me), so this intermittent latency was making no sense.
Digging into the connection details I noticed that most of my EKS clusters are returning 2 IPs for the hostname used by the control plane, except the one with the latency issue. That one is returning 3 IPs.
The control plane IPs are belonging to ENIs registered in the account, so I was able to check why this specific cluster is providing 3 IPs. Turned out that one of these IPs got no ENI behind it. And this is the root cause of the occasional latency, whenever the DNS query is returning this "old" IP the connection times out after 10s and then tries the next IP, which is working as expected.
I have no idea why this old IP is still in the DNS records. The control plane dns record is managed by AWS, I have no access to it.
I have tried:
- move the EKS between private and public access
- remove the subnet from the eks control plane network configuration
- remove the complete subnet the old IP belongs to
For each of these changes the 2 live IPs changed as expected, but the old 3rd IP is still there.
Has anybody seen this before? How could I force the removal of this old IP?