Unpacking the Cluster Networking for Amazon EKS Hybrid Nodes

14 minute read
Content level: Advanced
4

A step-by-step walkthrough on setting up the cluster networking for EKS Hybrid Nodes, including different options of Container Network Interface (CNI) and load balancing solutions

AWS has recently launched a new EKS feature called Amazon EKS Hybrid Nodes. With EKS Hybrid Nodes, customers can use their existing on-premises and edge infrastructure as nodes in Amazon EKS clusters. This provides you with the flexibility to run your container workloads anywhere, while maintaining a consistent Kubernetes management experience.

One of the key aspects of the EKS Hybrid Nodes solution is the hybrid networking architecture between the cloud-hosted Amazon EKS control plane and the hybrid nodes running in your environment. This post provides a comprehensive guide to show you how to set up the cluster networking for your EKS hybrid nodes, and we’ll explore various options of the Container Network Interface (CNI) and load balancing solutions.

 

EKS Hybrid Nodes - Demo Architecture Overview

Below is an architecture overview of our EKS Hybrid Nodes demo environment. This post will focus on the networking part of the hybrid cluster, and I will not cover the details on how to set up the EKS control plane - please refer to this blog post for the details on that part.

Lab Arch Overview

Pre-requisites

  • hybrid private L3 connectivity between your on-prem environment and AWS (e.g. VPN or DX)
  • local compute infrastructure (bare-metal or hypervisors)
  • supported OS for hybrid nodes: Amazon Linux 2023, RHEL 8/9, Ubuntu 20.04/22.04/24.04
  • 2x routable private CIDR blocks for RemoteNodeNetwork and RemotePodNetwork
  • AWS CLI version 2.22.8 or later or 1.36.13 or later with appropriate credentials
  • Helm v3 for installing CNIs and other add-ons such as AWS Load Balancer Controller
  • nodeadm for preparing and registering the hybrid nodes

Specifically for this demo, I have configured an IPsec VPN to connect my on-prem lab environment to the EKS VPC using a AWS Transit Gateway (TGW). I have deployed a EKS control plane running v1.31.5 and I have provisioned 2x EKS hybrid nodes using nodeadm. The EKS Hybrid Nodes are running Ubuntu 24.04 over a VMware ESXi 8 hypervisor.

In addition, I have deployed/reserved the below CIDR blocks for this demo setup:

  • EKS VPC - 10.250.0.0/16
  • EKS On-prem Node CIDR - 192.168.200.0/24 (RemoteNodeNetwork)
  • EKS On-prem Pod CIDR - 192.16.32.0/23 (RemotePodNetwork)

You'll also need to configure on-prem firewall (if any) to allow inbound https/443 access from the EKS control plane, and outbound access for RemoteNodeNetwork and RemotePodNetwork. Similarly, you must ensure ingress https/443 access from the RemoteNodeNetwork and RemotePodNetwork, and outbound access for the EKS control plane are configured at the EKS cluster Security Group at the AWS side.

Enter image description here

 
To learn more, visit Prerequisites for EKS Hybrid Nodes in the Amazon EKS User Guide.

 

Configure Container Network Interface (CNI)

The Amazon VPC CNI is not compatible with hybrid nodes deployment, and you must install a CNI for the nodes to become ready to serve workloads. Newly provisioned hybrid nodes will appear as Not Ready until a CNI is installed. Cilium and Calico are supported as the CNI options for Amazon EKS Hybrid Nodes.

In most cases, your on-prem router will already have a route to the Node subnet. However, you still need to ensure the CNI-provisioned Pod CIDRs are routed towards each node accordingly. This is important for the EKS control plane to reach to the Webhook Pods running in your hybrid nodes.

Depending on your on-prem environment, we could use Static or BGP routing to achieve this.

 

Static Routing (Calico example)

If you have a small on-prem environment, or the on-prem router doesn't support BGP, you could set up static routing to route pre-defined Pod CIDRs to each node. However, this would require you to manually assign Pod CIDR blocks to each node. In this example, I will demonstrate how to achieve this by utilizing the Calico CNI.

Enter image description here

In order to assign fixed Pod CIDR block for each node, we'll first need to apply customized labels to each node. Note the nodes are currently showing "NotReady" since this is a newly provisioned cluster.

$ kubectl get nodes -A
NAME                   STATUS     ROLES    AGE    VERSION
mi-060cf84b2fb8eb550   NotReady   <none>   132m   v1.31.5-eks-5d632ec
mi-0a5409bb63b61fff0   NotReady   <none>   131m   v1.31.5-eks-5d632ec

$ kubectl label nodes mi-060cf84b2fb8eb550 zone=node-01
node/mi-060cf84b2fb8eb550 labeled

$ kubectl label nodes mi-0a5409bb63b61fff0 zone=node-02
node/mi-0a5409bb63b61fff0 labeled

We'll then create a Calico config file assigning specific Pod CIDRs for each node using the cutomized labels, and these blocks must fall into the RemotePodNetwork as defined at the EKS cluster provision stage.

$ cat calico-values.yaml 
installation:
  enabled: true
  cni:
    type: Calico
    ipam:
      type: Calico
  calicoNetwork:
    bgp: Disabled
    ipPools:
    - cidr: 192.168.32.0/26
      encapsulation: VXLAN
      natOutgoing: Enabled
      nodeSelector: zone == "node-01"
    - cidr: 192.168.32.64/26
      encapsulation: VXLAN
      natOutgoing: Enabled
      nodeSelector: zone == "node-02"  
  controlPlaneReplicas: 1
  controlPlaneNodeSelector:
    eks.amazonaws.com/compute-type: hybrid
  calicoNodeDaemonSet:
    spec:
      template:
        spec:
          nodeSelector:
            eks.amazonaws.com/compute-type: hybrid
  csiNodeDriverDaemonSet:
    spec:
      template:
        spec:
          nodeSelector:
            eks.amazonaws.com/compute-type: hybrid
  calicoKubeControllersDeployment:
    spec:
      template:
        spec:
          nodeSelector:
            eks.amazonaws.com/compute-type: hybrid
  typhaDeployment:
    spec:
      template:
        spec:
          nodeSelector:
            eks.amazonaws.com/compute-type: hybrid

Now use Helm to install Calico with the config file.

$ helm repo add cilium https://helm.cilium.io/
$ helm install calico projectcalico/tigera-operator --version 3.29.2 --namespace kube-system -f calico-values.yaml

You should see 2x Pod IP pools automatically created, each containing a /26 CIDR assigned to a specific node (based on the customized labels) as we defined earlier.

$ kubectl get ippools.crd.projectcalico.org 
NAME               AGE
192.168.32.0-26    5d3h
192.168.32.64-26   5d3h

$ kubectl describe ippools 192.168.32.0-26
Name:         192.168.32.0-26
... <truncated>...
  Block Size:     26
  Cidr:           192.168.32.0/26
  Ipip Mode:      Never
  Nat Outgoing:   true
  Node Selector:  zone == "node-01"
  Vxlan Mode:     Always

$ kubectl describe ippools 192.168.32.64-26
Name:         192.168.32.64-26
... <truncated>...
  Block Size:     26
  Cidr:           192.168.32.64/26
  Ipip Mode:      Never
  Nat Outgoing:   true
  Node Selector:  zone == "node-02"
  Vxlan Mode:     Always

Lastly, configure 2x static Pod routes (towards the 2x nodes) at the on-prem router, and then redistribute them to the rest of the network.

vyos@VyOS-RT01# set protocols static route 192.168.32.0/26 next-hop 192.168.200.11
vyos@VyOS-RT01# set protocols static route 192.168.32.64/26 next-hop 192.168.200.12

The status of the hybrid nodes should now change to Ready with Calico CNI installed.

Enter image description here

 

BGP (Cilium example)

Another common method for managing cluster route distribution is via BGP. I will use Cilium as the CNI to demonstrate this. Specifically, the on-prem router will be configured as a BGP Route Reflector (RR) so it will learn the Pod routes dynamically, and it will not be in the datapath for intra-cluster traffic.

Enter image description here

First, we create a Cilium config file to specify the whole Pod CIDR range (192.168.32.0/23) with the block size (/26) for each node. Again, this CIDR range needs to match the RemotePodNetwork as defined at the EKS cluster provision.

$ cat cilium-values.yaml 
kaffinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: eks.amazonaws.com/compute-type
          operator: In
          values:
          - hybrid
ipam:
  mode: cluster-pool
  operator:
    clusterPoolIPv4MaskSize: 26
    clusterPoolIPv4PodCIDRList:
    - 192.168.32.0/23
operator:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: eks.amazonaws.com/compute-type
            operator: In
            values:
              - hybrid
  unmanagedPodWatcher:
    restart: false
envoy:
  enabled: false

Next, we'll deploy Cilium using the config and we'll set bgpControlPlane.enabled to True.

$ helm repo add cilium https://helm.cilium.io/
$CILIUM_VERSION=1.16.7

$ helm install cilium cilium/cilium \
    --version ${CILIUM_VERSION} \
    --namespace kube-system \
    --values cilium-values.yaml \
    --set bgpControlPlane.enabled=true

You should see the cilium-operator deployment and the cilium-agent running on each of your hybrid nodes.

$ kubectl get pods -n kube-system -o wide | grep cilium
cilium-kw6ng                       1/1     Running   0              7m40s   192.168.200.12   mi-0a5409bb63b61fff0   <none>           <none>
cilium-operator-6dd8d6d489-x6zqt   1/1     Running   0              7m40s   192.168.200.12   mi-0a5409bb63b61fff0   <none>           <none>
cilium-operator-6dd8d6d489-xdw29   1/1     Running   0              7m40s   192.168.200.11   mi-060cf84b2fb8eb550   <none>           <none>
cilium-x4qjq                       1/1     Running   0              7m40s   192.168.200.11   mi-060cf84b2fb8eb550   <none>           <none>

We then create and apply a BGP cluster configuration, including local and remote ASNs and peer-router address etc.

$ cat cilium-bgp-cluster.yaml 
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPClusterConfig
metadata:
  name: cilium-bgp
spec:
  nodeSelector:
    matchExpressions:
    - key: eks.amazonaws.com/compute-type
      operator: In
      values:
      - hybrid
  bgpInstances:
  - name: "home_lab"
    localASN: 65432
    peers:
    - name: "VyOS_RT01"
      peerASN: 65432
      peerAddress: 192.168.200.254
      peerConfigRef:
        name: "cilium-peer"

$ kubectl apply -f cilium-bgp-cluster.yaml 

Finally, we'll configure the cluster to advertise Pod CIDRs from each node to the upstream router (BGP RR).

$ cat cilium-bgp-advertisement.yaml 
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPAdvertisement
metadata:
  name: bgp-advertisements
  labels:
    advertise: bgp
spec:
  advertisements:
    - advertisementType: "PodCIDR"

$ kubectl apply -f cilium-bgp-advertisement.yaml 

For reference, here is the BGP configuration at the on-prem VyOS router (BGP RR).

set protocols bgp system-as '65432'
set protocols bgp neighbor 192.168.200.11 address-family ipv4-unicast route-reflector-client
set protocols bgp neighbor 192.168.200.11 address-family ipv4-unicast soft-reconfiguration inbound
set protocols bgp neighbor 192.168.200.11 graceful-restart 'enable'
set protocols bgp neighbor 192.168.200.11 remote-as '65432'
set protocols bgp neighbor 192.168.200.11 timers holdtime '30'
set protocols bgp neighbor 192.168.200.11 timers keepalive '10'
set protocols bgp neighbor 192.168.200.12 address-family ipv4-unicast route-reflector-client
set protocols bgp neighbor 192.168.200.12 address-family ipv4-unicast soft-reconfiguration inbound
set protocols bgp neighbor 192.168.200.12 graceful-restart 'enable'
set protocols bgp neighbor 192.168.200.12 remote-as '65432'
set protocols bgp neighbor 192.168.200.12 timers holdtime '30'
set protocols bgp neighbor 192.168.200.12 timers keepalive '10'

We can verify the BGP status at the EKS cluster side using the Cilium CLI.

$ cilium bgp peers
Node                   Local AS   Peer AS   Peer Address      Session State   Uptime   Family         Received   Advertised
mi-060cf84b2fb8eb550   65432      65432     192.168.200.254   established     57m30s   ipv4/unicast   2          2    
mi-0a5409bb63b61fff0   65432      65432     192.168.200.254   established     57m32s   ipv4/unicast   2          2    

At the on-prem router, we can see the same result and we are receiving the 2x /26 Pod CIDRs from the hybrid nodes.

vyos@VyOS-RT01:~$ show ip bgp 
BGP table version is 2, local router ID is 192.168.200.254, vrf id 0
Default local pref 100, local AS 65432
Status codes:  s suppressed, d damped, h history, u unsorted, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>i 192.168.32.0/26  192.168.200.12                100      0 i
 *>i 192.168.32.64/26 192.168.200.11                100      0 i

 

Configure Load Balancing

In addition to the CNI component, you'll also need to install a load balancing solution so that you can expose microservices externally.

If the services are only required for on-prem private access, we could deploy a locally hosted load balancing solution such as MetalLB.

Alternatively, if we are planning to publish the service externally to the Region or to the Internet, we could leverage the cloud hosted solution such as AWS Load Balancer Controller.

 

On-prem Load Balancer (MetalLB example)

MetalLB is one of the commonly deployed load balancing solution for on-prem Kubernetes environment.

It offers a L2 Advertisement mode, which reserves a small address block from the Node subnet for Load Balancer external addresses. This simplifies the LoadBalancer address deployment and eliminates the complexity of setting up BGP routing. (NOTE: If you are already running Cilium as the CNI, you could potentially leverage Cilium's native Load balancing solution to achieve the same goal.)

To begin, we'll first install the MetalLB following the official guide. Make sure the LB controller and speakers are running correctly.

$ kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.9/config/manifests/metallb-native.yaml

$ kubectl get pods -n metallb-system 
NAME                          READY   STATUS    RESTARTS   AGE
controller-74b6dc8f85-ngswz   1/1     Running   0          3d5h
speaker-qg8ps                 1/1     Running   0          3d5h
speaker-sgpp2                 1/1     Running   0          3d5h

Next, we'll reserve a small address block (192.168.200.201-220) from the Node subnet, and assign it for the LB external addresses using the L2 Advertisement mode. Under the L2 mode, MetalLB will respond to ARP requests targeting the LB addresses within the Node subnet, hence eliminating the need of any L3 advertisement.

$ cat metallb-config.yaml 

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default
  namespace: metallb-system
spec:
  addresses:
  - 192.168.200.201-192.168.200.220
  autoAssign: true
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default
  namespace: metallb-system
spec:
  ipAddressPools:
  - default

$ kubectl apply -f metallb-config.yaml 

Now let's test this with a popular microservices demo app Online Boutique.

$ git clone --depth 1 --branch v0 https://github.com/GoogleCloudPlatform/microservices-demo.git
$ cd microservices-demo/
$ kubectl apply -f ./release/kubernetes-manifests.yaml

After the deployment is completed, you should see a LoadBalancer service is deployed with an External-IP (192.168.200.201) assigned from the previously defined LB address pool.

$ kubectl get svc  frontend-external
NAME                TYPE           CLUSTER-IP       EXTERNAL-IP       PORT(S)        AGE
frontend-external   LoadBalancer   172.16.208.141   192.168.200.201   80:32373/TCP   112s

From the on-prem network, you should be able to access the frontend service via the Load Balancer's External-IP.

Enter image description here

 

External Load Balancer (AWS Load Balancer Controller example)

If you need to expose services to the Internet or back to the AWS region, you could leverage the AWS Load Balancer Controller.

One advantage of the AWS Load Balancer Controller is that it supports both LoadBalancer and Ingress resources by using AWS native networking services such as the Network Load Balancers (NLBs) and Application Load Balancers (ALBs). Additionally, you can use ALB to provide SSL termination & offloading, and to deliver extra layer of security through the integration with edge services such as WAF or Shield.

As a prerequisite, within the EKS VPC you'll need to prepare at least 2x private subnets and 2x public subnets (for private & public ALBs) with the follow tagging:

Private Subnets

  • Key – kubernetes.io/role/internal-elb
  • Value – 1

Public Subnets

  • Key – kubernetes.io/role/elb
  • Value – 1

We'll first need to configure an OIDC provider for your cluster, and then create an IAM role/policy for deploying the controller.

$ cluster_name=<my-cluster>
$ eksctl utils associate-iam-oidc-provider --cluster $cluster_name --approve

$ curl -O https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.11.0/docs/install/iam_policy.json
$ aws iam create-policy  --policy-name AWSLoadBalancerControllerIAMPolicy  --policy-document file://iam_policy.json

Next, create a EKS service account for the Load Balancer Controller, using the IAM role/policy from previous step.

$ eksctl create iamserviceaccount \
  --cluster=<my-cluster> \
  --namespace=kube-system \
  --name=aws-load-balancer-controller \
  --role-name AmazonEKSLoadBalancerControllerRole \
  --attach-policy-arn=arn:aws:iam::<my-account-id>:policy/AWSLoadBalancerControllerIAMPolicy \
  --approve

Install the AWS Load Balancer Controller using Helm and then verify the deployment status.

$ helm repo add eks https://aws.github.io/eks-charts
$ helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=<my-cluster> \
  --set serviceAccount.create=false \
  --set serviceAccount.name=aws-load-balancer-controller \
  --set region=<my-vpc-region> \
  --set vpcId=<my-vpc-id>

$ kubectl get deployment -n kube-system aws-load-balancer-controller
NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
aws-load-balancer-controller   2/2     2            2           39s

To test the controller, we'll deploy a 2048 game as a sample application by using Kubernetes Ingress (via ALB integration).

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.11.0/docs/examples/2048/2048_full.yaml

Note in order to register Pods as targets for the ALB, you'll need to specify target-type as IP mode. This is configured using the alb.ingress.kubernetes.io/target-type: ip annotation as defined in the 2048 deployment file (Ingress section).

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: game-2048
  name: ingress-2048
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip

Now let's verified the deployment status of the game Pods (5x replicas) and Ingress. Notice an ALB is automatically provisioned as the Kubernetes Ingress, and we are getting a public DNS address for the published service.

$ kubectl get pods -n game-2048 -o wide
NAME                               READY   STATUS    RESTARTS   AGE   IP              NODE                   NOMINATED NODE   READINESS GATES
deployment-2048-7df5f9886b-78gw5   1/1     Running   0          15m   192.168.32.25   mi-060cf84b2fb8eb550   <none>           <none>
deployment-2048-7df5f9886b-bp5kj   1/1     Running   0          15m   192.168.32.81   mi-0a5409bb63b61fff0   <none>           <none>
deployment-2048-7df5f9886b-hp4fh   1/1     Running   0          15m   192.168.32.26   mi-060cf84b2fb8eb550   <none>           <none>
deployment-2048-7df5f9886b-mtrrp   1/1     Running   0          15m   192.168.32.24   mi-060cf84b2fb8eb550   <none>           <none>
deployment-2048-7df5f9886b-wp67m   1/1     Running   0          15m   192.168.32.82   mi-0a5409bb63b61fff0   <none>           <none>

$ kubectl get ingress -n game-2048 
NAME           CLASS   HOSTS   ADDRESS                                                                      PORTS   AGE
ingress-2048   alb     *       k8s-game2048-ingress2-184c5765e8-32714527.ap-southeast-2.elb.amazonaws.com   80      8m29s

Back at the AWS console, we can confirm an ALB is provisioned with 5x registered IP targets mapped to the Pods as shown above.

Enter image description here

You can now access the game by navigating to the Ingress/ALB DNS address.

Enter image description here

 

Conclusion

In this post, I have walked through detailed steps for setting up the cluster networking for Amazon EKS hybrid nodes. I have demonstrated different CNI options and load balancing solutions for various use cases and scenarios.

To learn more, please refer to the following resources: