Skip to content

EKS kubernetes service and inter-AZ traffic

6 minute read
Content level: Expert
2

The kubernetes service and its endpoint and endpointslices objects in the default namespace are a topic of common questions but even misunderstanding. This article will explain how it is implemented and what are the consequences for intra-AZ traffic via the cross-account ENI (X-ENI)

Kubernetes basics

Let's start with a quick recap of how Kubernetes (K8s) manages Endpoints and EndpointSlice objects.

Endpoints are handled by the endpoint-controller as part of the kube-controller-manager (KCM). This is visible on the object itself as a label endpoints.kubernetes.io/managed-by: endpoint-controller. An Endpoints K8s object has the same name as the related K8s Service object.

Endpoints as of K8s/EKS 1.33 are deprecated. Details can be found in the K8s upstream blog post Kubernetes v1.33: Continuing the transition from Endpoints to EndpointSlices and kubectl will throw a related error message Warning: v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice.

Here is an example:

# "kubernetes" service
$ kubectl get svc -n kube-system kube-dns
NAME       TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.100.0.10   <none>        53/UDP,53/TCP,9153/TCP   99d

# corresponding "kubernetes" endpoint
$ kubectl get ep -n kube-system kube-dns  -o yaml
Warning: v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice
apiVersion: v1
kind: Endpoints
metadata:
...
  labels:
    eks.amazonaws.com/component: kube-dns
    endpoints.kubernetes.io/managed-by: endpoint-controller
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: CoreDNS
  name: kube-dns
  namespace: kube-system

...

EndpointSlice are handled by the endpointslice-controller as part of the kube-controller-manager (KCM). The controller automatically creates EndpointSlices for any Kubernetes Service that has a selector specified.

The following output snippet shows the relevant information:

$ kubectl get endpointslices -n kube-system kube-dns-lzht5 -o yaml
...
kind: EndpointSlice
metadata:
  annotations:
    endpoints.kubernetes.io/last-change-trigger-time: "2026-04-21T08:13:45Z"
...
  labels:
    eks.amazonaws.com/component: kube-dns
    endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: CoreDNS
    kubernetes.io/service-name: kube-dns
  name: kube-dns-lzht5
  namespace: kube-system
  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: Service
    name: kube-dns
...
- addresses:
  - 192.168.141.163
  conditions:
    ready: true
    serving: true
    terminating: false
  nodeName: i-<redacted>
  targetRef:
    kind: Pod
    name: coredns-fd7d56586-5jkn2
    namespace: kube-system
    ...
  zone: eu-west-1a
...

Note the ownerReferences pointing to the service controller and the label endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io. In addition the addresses contain a topology hint with a zone attribute which allows Topology Aware Routing.

For user-created endpoints the endpointslice-mirror-controller automatically creates corresponding EndpointSlice for backward compatibility, see upstream K8s doc EndpointSlice mirroring. This behaviour can be prevented with the label endpointslice.kubernetes.io/skip-mirror : true.

"kubernetes" service

Let's look at the kubernetes service objects in default namespace:

# "kubernetes" service
$ kubectl get svc kubernetes
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.100.0.1   <none>        443/TCP   251d

# corresponding "kubernetes" endpoint
$ kubectl get ep kubernetes -o yaml
Warning: v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice
apiVersion: v1
kind: Endpoints
metadata:
...
  labels:
    endpointslice.kubernetes.io/skip-mirror: "true"
  name: kubernetes
  namespace: default
...
subsets:
- addresses:
  - ip: <redacted IP1>
  - ip: <redacted IP2>
  ports:
  - name: https
    port: 443
    protocol: TCP

# corresponding "kubernetes" endpointslices
$ kubectl get endpointslices kubernetes -o yaml
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
  - <redacted IP1>
  conditions:
    ready: true
- addresses:
  - <redacted IP2>
  conditions:
    ready: true
kind: EndpointSlice
metadata:
...
  labels:
    kubernetes.io/service-name: kubernetes
  name: kubernetes
  namespace: default
...
ports:
- name: https
  port: 443
  protocol: TCP

First of all neither endpoint nor endpointslice contain managed-by label . The endpoint even has the label endpointslice.kubernetes.io/skip-mirror: "true"! In addition there are no topology hints in the endpointslice.

What does this mean?

These objects are not managed by endpoint-controller or endpointslice-controller and the endpointslice-mirror-controller is prohibited to mirror the endpoint!

What's about the IP addresses in the objects?

These are not pod IP from the cluster CIDR but rather the IP of the cross-account EKS ENI (X-ENI)! You can double-check this with:

$ aws ec2 describe-network-interfaces --query 'NetworkInterfaces[?Description==`Amazon EKS <cluster name>`].[NetworkInterfaceId,PrivateIpAddress,AvailabilityZoneId]' --output table
----------------------------------------------------------
|                DescribeNetworkInterfaces               |
+------------------------+-------------------+-----------+
|  eni-<redacted> |  <redacted IP1>   |  euw1-az1 |
|  eni-<redacted> |  <redacted IP2>   |  euw1-az2 |

But how are these objects created?

My first thought was, and probably you might guess like me, it is something EKS specific running in the control plane, probably a Lambda or step function, changing the object on every kube-apiserver update (scale-in/out of control plane, version or security related updates etc.).

But this is not the case - it is a default upstream K8s implementation.

"kubernetes" service implementation

To understand this, we need to look into GitHub repo kubernetes/kubernetes:

pkg/controlplane is the Kubernetes-specific implementation of a running API server instance — it sits on top of the generic k8s.io/apiserver framework and adds everything that makes it specifically the Kubernetes API server.

Its entry point is pkg/controlplane/instance.go, which defines a kube-apiserver Config and CompletedConfig.New() method. That method is called by cmd/kube-apiserver/app/server.go during startup and does two things relevant here:

  • it selects an endpoint reconciler strategy , which defaults to lease by constructing an EndpointsAdapter
  • then instantiates the kubernetesservice.Controller and registers it as the "bootstrap-controller" post-start hook.

Once the kube-apiserver's HTTPS listener is up, that hook fires and starts the controller's reconcile loop, which periodically calls UpdateKubernetesService - first creating or updating the kubernetes Service object in the default namespace via CreateOrUpdateMasterServiceIfNeeded, then calling ReconcileEndpoints on the chosen reconciler. The lease-based reconciler maintains per-apiserver TTL entries in etcd under key space /registry/masterleases/<ip> and rebuilds the Endpoints object from the live set of kube-apiserver IPs, while every write to Endpoints goes through EndpointsAdapter which immediately mirrors the result into a matching EndpointSlice via EnsureEndpointSliceFromEndpoints - keeping all three objects (Service, Endpoints, EndpointSlice) in sync for the lifetime of the cluster.

Basically creation and reconciliation of the kubernetes service, endpoint and endpointslices objects is part of the kube-apiserver itself!

The bootstrap-controller post-start hook is visible in kube-apiserver /readyz endpoint:

$ k get --raw /readyz?verbose
[+]ping ok
[+]log ok
[+]etcd ok
[+]kms-providers ok
[+]poststarthook/start-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
...
[+]poststarthook/bootstrap-controller ok
...

[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/apiservice-openapiv3-controller ok
healthz check passed

The upstream implementation of the kubernetes related K8s objects is vendor agnostic and just uses IP addresses in the endpointslices. EKS is fully upstream compliant and uses this implementation !

This explains why we cannot achieve a topology-aware routing to the X-ENI. Even if you modify the endpointslice manually with zone attribute, it will be reconciled by the above ReconcileEndpoints method to just contain IP's in about 10..15s (according to DefaultEndpointReconcilerInterval and DefaultEndpointReconcilerTTL)!

But this unfortunately means it is not possible to reduce inter-AZ (Availability Zone) traffic to X-ENI !!!

Another consequence of reconciling is that If an apiserver instance has already shutdown the endpoints object can be out of date up to 15 seconds during which applications may try and fail to connect to .

Note: Parts of the code investigation was done with Amazon Q.

AWS
EXPERT
published 24 days ago131 views