Deploying Cilium Networking on Amazon EKS Hybrid Nodes

22 minute read
Content level: Advanced
0

A practical guide for deploying Cilium CNI onto Amazon EKS cluster with hybrid nodes, including detailed walkthrough for static and BGP routing configurations, as well as load balancing options in both L2 and L3 mode for various scenarios.

Amazon EKS Hybrid Nodes enables organizations to seamlessly integrate their existing on-premises and edge infrastructure into Amazon EKS clusters as remote nodes. This provides you with the flexibility to deploy containerized applications in any environment while ensuring consistent Kubernetes operations, addressing requirements such as low latency, regulatory compliance, and data residency controls.

A critical piece of the EKS Hybrid Nodes architecture is the hybrid connectivity between the Amazon EKS control plane hosted in your VPC and the hybrid nodes running in your on-premises environment. In addition, to use EKS with hybrid nodes deployment, you must install a Container Network Interface (CNI) add-on since the Amazon VPC CNI is not compatible with hybrid nodes.

Cilium is one of the recommended CNI options for EKS Hybrid Nodes. Since joining the CNCF in 2021, Cilium has emerged as the fastest-growing Kubernetes CNI project, delivering the following architectural benefits::

  • eBPF-powered data plane delivering low-latency, high performance networking with great scalability
  • Advanced security capabilities such as identity-based security, L7 network policy and transparent encryption
  • Comprehensive observability including real-time visibility into application dependencies and traffic flows
  • Flexible architecture offering seamless integration with multi-cluster (cluster mesh) and multi-cloud environments

This post walks you through the process for setting up Cilium CNI, and configuring different cluster networking modes and load balancing options for your Amazon EKS cluster with hybrid nodes. You can select the most suitable option based on your existing technologies and hybrid networking environment.

This guide covers most common hybrid cluster networking models and scenarios, and is aimed to help expedite the deployment process for your hybrid nodes-enabled EKS cluster by leveraging the Cilium CNI.

 

Architecture Overview

The below diagram is a high-level architecture overview of our EKS Hybrid Nodes demo environment.

Enter image description here

The node and pod CIDRs for your hybrid nodes and container workloads must be globally unique and routable across your hybrid network environment. When creating an EKS cluster with hybrid nodes, you provide these CIDRs as inputs via the RemoteNodeNetwork and RemotePodNetwork fields.

In most cases, your on-premises router will already have a route to the node subnet. However, you must also ensure the Pod CIDRs and (optionally) Load Balancer Service external IPs are routed towards each hybrid node accordingly, so that they can be reached from the EKS control-plane and other services via the region or from your on-premises networks.

As illustrated above, I will be using the below CIDR blocks for this demo setup:

  • EKS VPC CIDR - 10.250.0.0/16
  • On-premises Node CIDR (RemoteNodeNetwork) - 192.168.200.0/24
  • On-premises Pod CIDR (RemotePodNetwork) - 192.16.32.0/23
  • On-premises Load Balancer Service IPs (LoadBalancer, Ingress) - 192.16.48.x/32 (advertised via BGP)

 

Pre-requisites

  • Deploy an Amazon EKS cluster with hybrid nodes - follow this blog post for a detailed walkthrough
  • On-premises compute nodes running a compatible operating system
  • Private connectivity between your on-premises network and Amazon VPC (via VPN or Direct Connect)
  • 2x routable CIDR blocks for RemoteNodeNetwork and RemotePodNetwork
  • 1x additional CIDR block to be used for Cilium Load Balancer external IPs (L3 BGP mode)
  • Configure on-premises firewall and the EKS cluster security group to allow bi-directional communications between the EKS control plane and remote node and pod CIDRs, as per the networking prerequisites
  • The following tools:

 

Walkthrough

Throughout this demo, I’ll be using Cilium v1.16.9 as it is currently the latest release from the recommended version. I’ll walk you through the Cilium installation and configuration covering the following aspects.

  • Cluster Networking using Static Routing
    • Cluster-Pool IPAM mode (default)
    • Multi-Pool IPAM mode (Beta)
  • Cluster Networking using BGP Routing
  • Load Balancing with L2 Announcement (ARP mode)
  • Load Balancing with L3 Announcement (BGP mode)
    • Load Balancer integration with BGP
    • Ingress integration with BGP
  • Clean Up

 

Cluster Networking using Static Routing

If you’re running only a few hybrid nodes or the local gateway lacks BGP support, static routing can be configured to simplify cluster network management. This approach requires manual assignment of Pod CIDRs to each node.

Cilium offers a few different IP Address Management (IPAM) methods, and I’ll show you how to configure static IP assignment for Pod CIDRs using the following two common IPAM modes:

  • Cluster-Pool IPAM mode
  • Multi-Pool IPAM mode

To ensure Cilium components are installed properly on our EKS hybrid nodes, I’ve prepared a base-value installation file with the following nodeAffinity configurations.

$ cat cilium-values-base.yaml 

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: eks.amazonaws.com/compute-type
          operator: In
          values:
          - hybrid
operator:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: eks.amazonaws.com/compute-type
            operator: In
            values:
              - hybrid
  unmanagedPodWatcher:
    restart: false
envoy:
  enabled: false

Cluster-Pool IPAM mode (default)

To begin, I’ll show you how to install Cilium and configure static Pod IP assignment using Cluster-Pool IPAM, which is the default IPAM mode for Cilium.

Let’s first verify the current EKS cluster status - note the newly provisioned hybrid nodes will remain in NotReady state until a CNI is deployed, as shown below.

$ kubectl get nodes
NAME                   STATUS     ROLES    AGE   VERSION
mi-0b396c58cf24e30b3   NotReady   <none>   12h   v1.32.1-eks-5d632ec
mi-0fb26f5f37789204f   NotReady   <none>   12h   v1.32.1-eks-5d632ec

We’ll now install Cilium with Cluster-Pool IPAM mode, using the base-value configuration defined earlier.

$ CILIUM_VERSION=1.16.9
$ helm install cilium cilium/cilium \
    --version ${CILIUM_VERSION} \
    --namespace kube-system \
    --values cilium-values-base.yaml \
    --set ipam.mode=cluster-pool \
    --set ipam.operator.clusterPoolIPv4PodCIDRList=192.168.32.0/23 \
    --set ipam.operator.clusterPoolIPv4MaskSize=26 

After Cilium is installed, we will then patch each node to assign a static Pod CIDR, and make sure to restart Cilium again for the change to take effect.

$ kubectl patch ciliumnode mi-0b396c58cf24e30b3 --type=merge -p '{"spec":{"ipam":{"podCIDRs":["192.168.32.0/26"]}}}'
$ kubectl patch ciliumnode mi-0fb26f5f37789204f --type=merge -p '{"spec":{"ipam":{"podCIDRs":["192.168.32.64/26"]}}}'

$ kubectl rollout restart daemonset -n kube-system cilium
$ kubectl rollout restart -n kube-system deployment cilium-operator

Verify both the cilium-operator and the cilium-agent pods are running on each of your hybrid nodes.

$ kubectl get pods -n kube-system -o wide | grep cilium
cilium-6s54z                       1/1     Running   0          12m   192.168.200.11   mi-0b396c58cf24e30b3   <none>           <none>
cilium-jgm6x                       1/1     Running   0          12m   192.168.200.12   mi-0fb26f5f37789204f   <none>           <none>
cilium-operator-7f65968cf5-57klx   1/1     Running   0          12m   192.168.200.11   mi-0b396c58cf24e30b3   <none>           <none>
cilium-operator-7f65968cf5-xh8c8   1/1     Running   0          12m   192.168.200.12   mi-0fb26f5f37789204f   <none>           <none>

We’ll also need to configure 2x /26 static routes for the Pod CIDRs at the upstream router, and redistribute them into the rest of the network. Below is a sample configuration using VyOS router.

vyos@VyOS-RT01# set protocols static route 192.168.32.0/26 next-hop 192.168.200.11
vyos@VyOS-RT01# set protocols static route 192.168.32.64/26 next-hop 192.168.200.12

The status of the hybrid nodes are now changed to Ready, with Cilium CNI installed and Pod CIDR statically assigned to each node.

$ kubectl get nodes
NAME                   STATUS     ROLES    AGE   VERSION
mi-0b396c58cf24e30b3   Ready      <none>   12h   v1.32.1-eks-5d632ec
mi-0fb26f5f37789204f   Ready      <none>   12h   v1.32.1-eks-5d632ec

Multi-Pool IPAM mode (Beta)

The Multi-Pool IPAM mode supports allocating Pod CIDRs from multiple different IPAM pools, depending on workload annotations and node labels defined by the user. It provides more flexibility for Pod CIDR management, and supports “topology-aware” IPAM such as DC/Rack location-based IP pool allocations. However, please note that the Multi-Pool IPAM mode is still under beta and does have some limitations.

To reconfigure the cluster IPAM mode, I’ll first reset the demo environment by removing the patches and uninstalling Cilium.

$ kubectl patch ciliumnode mi-0b396c58cf24e30b3 --type='json' -p='[{"op": "remove", "path": "/spec/ipam/podCIDRs", "value": ["192.168.32.0/26"]}]'
$ kubectl patch ciliumnode mi-0fb26f5f37789204f --type='json' -p='[{"op": "remove", "path": "/spec/ipam/podCIDRs", "value": ["192.168.32.64/26"]}]'

$ helm uninstall cilium -n kube-system

I’ll then re-install Cilium with Multi-Pool IPAM mode, which requires the the below additional options as explained here.

$ CILIUM_VERSION=1.16.9
$ helm install cilium cilium/cilium \
    --version ${CILIUM_VERSION} \
    --namespace kube-system \
    --values cilium-values-base.yaml \
    --set ipam.mode=multi-pool \
    --set ipv4NativeRoutingCIDR=192.168.32.0/23 \
    --set routingMode=native \
    --set autoDirectNodeRoutes=true \
    --set endpointRoutes.enabled=true \
    --set kubeProxyReplacement=true \
    --set bpf.masquerade=true

To use Multi-Pool IPAM for per-node IP assignment, we’ll apply a custom label for each node.

$ kubectl label nodes mi-0b396c58cf24e30b3 zone=node-01
$ kubectl label nodes mi-0fb26f5f37789204f zone=node-02

We then create 2x Pod IP Pools, and associate them to each node through the CiliumNodeConfig using the custom label.

$ cat cilium-static-pool.yaml 
---
apiVersion: cilium.io/v2alpha1
kind: CiliumPodIPPool
metadata:
  name: node01-pool
spec:
  ipv4:
    cidrs:
      - 192.168.32.0/26
    maskSize: 26
---
apiVersion: cilium.io/v2alpha1
kind: CiliumPodIPPool
metadata:
  name: node02-pool
spec:
  ipv4:
    cidrs:
      - 192.168.32.64/26
    maskSize: 26
---
apiVersion: cilium.io/v2
kind: CiliumNodeConfig
metadata:
  name: ip-pool-node01
  namespace: kube-system
spec:
  defaults:
    ipam-default-ip-pool: node01-pool
  nodeSelector:
    matchLabels:
      zone: node-01
---
apiVersion: cilium.io/v2
kind: CiliumNodeConfig
metadata:
  name: ip-pool-node02
  namespace: kube-system
spec:
  defaults:
    ipam-default-ip-pool: node02-pool
  nodeSelector:
    matchLabels:
      zone: node-02

Apply the per-node IP Pool configurations to each node, and restart Cilium.

$ kubectl apply -f cilium-static-pool.yaml

$ kubectl rollout restart daemonset -n kube-system cilium
$ kubectl rollout restart -n kube-system deployment cilium-operator

You might also need to restart all coredns pods to make sure they are getting the correct addresses from the new IP pools.

$ kubectl delete pod -n kube-system  coredns-7575495454-kq5tq coredns-7575495454-t2zm7 

We can use kubectl describe ciliumnode to verify the Pod IP assignment for each node. As shown below, each node is getting allocated with a specific IP pool as per our configurations via the Multi-Pool IPAM.

$ kubectl describe ciliumnode mi-0b396c58cf24e30b3
[...]
Spec:
  Addresses:
    Ip:    192.168.200.11
    Type:  InternalIP
    Ip:    192.168.32.2
    Type:  CiliumInternalIP
  Bootid:  6119bed6-ddec-4ab6-9581-78c830024c0e
  Encryption:
  Eni:
  Health:
    ipv4:  192.168.32.23
  Ingress:
  Ipam:
    Pools:
      Allocated:
        Cidrs:
          192.168.32.0/26
        Pool:  node01-pool
[...]


$ kubectl describe ciliumnode mi-0fb26f5f37789204f
[...]
Spec:
  Addresses:
    Ip:    192.168.200.12
    Type:  InternalIP
    Ip:    192.168.32.83
    Type:  CiliumInternalIP
  Bootid:  c2364ffd-5463-4ab8-9228-b47e5ad813ed
  Encryption:
  Eni:
  Health:
    ipv4:  192.168.32.93
  Ingress:
  Ipam:
    Pools:
      Allocated:
        Cidrs:
          192.168.32.64/26
        Pool:  node02-pool
[...]

 

Cluster Networking using BGP Routing

For large-scale deployment, BGP is a recommended routing protocol for improved scalability and streamlined route management for your Pod CIDRs.

Next, we’ll deploy Cilium with BGP enabled and configure the on-premises router as a BGP Route Reflector (RR). This setup enables the on-premises router to dynamically learn Pod CIDRs via BGP without participating in the data path for intra-cluster communications.

$ helm uninstall cilium -n kube-system

$ CILIUM_VERSION=1.16.9
$ helm install cilium cilium/cilium \
    --version ${CILIUM_VERSION} \
    --namespace kube-system \
    --values cilium-values-base.yaml \
    --set ipam.mode=cluster-pool \
    --set ipam.operator.clusterPoolIPv4PodCIDRList=192.168.32.0/23 \
    --set ipam.operator.clusterPoolIPv4MaskSize=26 \
    --set bgpControlPlane.enabled=true

$ kubectl rollout restart daemonset -n kube-system cilium
$ kubectl rollout restart -n kube-system deployment cilium-operator

We then apply BGP cluster and peer configurations, including Autonomous System (AS) numbers, peer-router address and BGP timer configurations etc.

$ cat cilium-bgp-cluster.yaml 
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPClusterConfig
metadata:
  name: cilium-bgp
spec:
  nodeSelector:
    matchExpressions:
    - key: eks.amazonaws.com/compute-type
      operator: In
      values:
      - hybrid
  bgpInstances:
  - name: "home_lab"
    localASN: 65432
    peers:
    - name: "VyOS_RT01"
      peerASN: 65432
      peerAddress: 192.168.200.254
      peerConfigRef:
        name: "cilium-peer"
        
$ cat cilium-bgp-peer.yaml
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPPeerConfig
metadata:
  name: cilium-peer
spec:
  timers:
    holdTimeSeconds: 30
    keepAliveTimeSeconds: 10
  gracefulRestart:
    enabled: true
    restartTimeSeconds: 120
  families:
    - afi: ipv4
      safi: unicast
      advertisements:
        matchLabels:
          advertise: "bgp"

$ kubectl apply -f cilium-bgp-cluster.yaml 
$ kubectl apply -f cilium-bgp-peer.yaml

We'll configure the cluster to advertise Pod CIDRs from each node to the upstream router (BGP RR).

$ cat cilium-bgp-advertisement.yaml 
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPAdvertisement
metadata:
  name: bgp-advertisements
  labels:
    advertise: bgp
spec:
  advertisements:
    - advertisementType: "PodCIDR"

$ kubectl apply -f cilium-bgp-advertisement.yaml 

For reference, here is the BGP RR configuration at the on-premises VyOS router.

set protocols bgp system-as '65432'
set protocols bgp neighbor 192.168.200.11 address-family ipv4-unicast route-reflector-client
set protocols bgp neighbor 192.168.200.11 address-family ipv4-unicast soft-reconfiguration inbound
set protocols bgp neighbor 192.168.200.11 graceful-restart 'enable'
set protocols bgp neighbor 192.168.200.11 remote-as '65432'
set protocols bgp neighbor 192.168.200.11 timers holdtime '30'
set protocols bgp neighbor 192.168.200.11 timers keepalive '10'
set protocols bgp neighbor 192.168.200.12 address-family ipv4-unicast route-reflector-client
set protocols bgp neighbor 192.168.200.12 address-family ipv4-unicast soft-reconfiguration inbound
set protocols bgp neighbor 192.168.200.12 graceful-restart 'enable'
set protocols bgp neighbor 192.168.200.12 remote-as '65432'
set protocols bgp neighbor 192.168.200.12 timers holdtime '30'
set protocols bgp neighbor 192.168.200.12 timers keepalive '10'

Using the Cilium CLI, we can verify the BGP session status at the EKS hybrid nodes.

$ cilium bgp peers
Node                   Local AS   Peer AS   Peer Address      Session State   Uptime   Family         Received   Advertised
mi-0b396c58cf24e30b3   65432      65432     192.168.200.254   established     2m25s    ipv4/unicast   2          2    
mi-0fb26f5f37789204f   65432      65432     192.168.200.254   established     2m23s    ipv4/unicast   2          2   

This can also be confirmed at the on-premises router as we are learning the 2x /26 Pod CIDRs from the hybrid nodes.

vyos@VyOS-RT01:~$ show ip bgp
BGP table version is 81, local router ID is 192.168.200.254, vrf id 0
Default local pref 100, local AS 65432
Status codes:  s suppressed, d damped, h history, u unsorted, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>i 192.168.32.0/26  192.168.200.11                100      0 i
 *>i 192.168.32.64/26 192.168.200.12                100      0 i

 

Load Balancing with L2 Announcement (ARP mode)

Cilium leverages Load Balancer IP Address Management (LB IPAM) to assign CIDRs to Services of type LoadBalancer. To expose Load Balancer services running on the EKS Hybrid Nodes to your on-premises networks, Cilium supports LB IPAM in conjunction with either L2 Announcement via ARP, or L3 Announcement via BGP Control Plane.

The L2 Announcement feature is primarily intended for remote office or edge environment without BGP support. With the L2 Announcement mode, we reserve a small address block from within the Node subnet for Load Balancer external addresses. When deploying a Load Balancer service, Cilium will assign an Virtual IP (VIP) from the LB IPAM pool and select one node to respond to ARP requests for the Load Balancer VIP. For each service, one node will respond to the ARP from within the same layer 2 segment using its own MAC address, hence eliminating the need of BGP routing.

Keep in mind that the L2 Announcement is still a beta feature and does have some limitations.

We can use the following commands to enable Cilium L2 Announcement for Load Balancer. Note you must enable the Kube Proxy replacement mode as documented here.

$ helm upgrade cilium cilium/cilium \
   --version ${CILIUM_VERSION} \
   --namespace kube-system \
   --reuse-values \
   --set externalIPs.enabled=true \
   --set l2announcements.enabled=true \
   --set kubeProxyReplacement=true 

$ kubectl rollout restart daemonset -n kube-system cilium
$ kubectl rollout restart -n kube-system deployment cilium-operator

We’ll first define a LB IPAM pool to reserve a small address block (192.168.200.201-220) from within the node subnet.

$ cat cilium-lb-l2pool.yaml 
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
  name: cilium-lb-l2pool
  namespace: kube-system
spec:
  blocks:
    - start: 192.168.200.201
      stop: 192.168.200.220
 
$ kubectl apply -f cilium-lb-l2pool.yaml

We then configure a L2 announcement policy that defines which nodes (i.e. hybrid nodes) will respond to the ARP requests, and which services (i.e. LB with matching labels) will be announced by the policy.

$ cat cilium-l2-announcement.yaml 
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
  name: cilium-lb-l2
  namespace: kube-system
spec:
  nodeSelector:
    matchLabels:
      eks.amazonaws.com/compute-type: hybrid
  serviceSelector:
    matchLabels:
      cilium-lb: "true"
  loadBalancerIPs: true
  externalIPs: true
  interfaces:
  - ens33

$ kubectl apply -f cilium-l2-announcement.yaml 

To test this, we’ll create a simple Ngnix Pod.

$ kubectl run nginx --image=nginx --restart=Never --labels="app=nginx" --port=80 -n default

We’ll expose the Pod using a Load Balancer service with L2 announcement. To ensure Cilium will announce the ARP for the LB VIP, use the matching label as defined in the above policy. Also, make sure to specify the Load Balancer Class to io.cilium/l2-announcer as defined here.

$ cat nginx-lb-l2-cilium.yaml 
apiVersion: v1
kind: Service
metadata:
  name: nginx-lb-l2
  labels:
    cilium-lb: "true"
spec:
  type: LoadBalancer
  loadBalancerClass: io.cilium/l2-announcer
  selector:
    app: nginx
  ports:
    - port: 80
      targetPort: 80

$ kubectl apply -f nginx-lb-l2-cilium.yaml 

As expected, a Load Balancer is deployed with an external IP (VIP) from the LB IPAM pool within the Node subnet, and we can access it without the need of BGP.

$ kubectl get svc -o wide | grep nginx
nginx-lb-l2     LoadBalancer   172.16.97.87   192.168.200.201   80:32367/TCP   102s   app=nginx

Enter image description here

From the upstream router, we can see the ARP for this LB VIP (192.168.200.201) is announced and served by node-01.

vyos@VyOS-RT01:~$ show arp | grep 192.168.200.201
192.168.200.201  eth1         00:50:56:88:55:98     REACHABLE

vyos@VyOS-RT01:~$ show arp | grep  00:50:56:88:55:98 
192.168.200.201  eth1         00:50:56:88:55:98     REACHABLE
192.168.200.11   eth1         00:50:56:88:55:98     REACHABLE

 

Load Balancing with L3 Announcement (BGP mode)

If you have already deployed Cilium with BGP control plane, it is straight forward to enable Load Balancer L3 announcement and no additional BGP configuration is required at the on-premises router. For this demo, we’ll test BGP integration with VIP announcement for both Kubernetes LoadBalancer services and Ingress resources.

In addition, if your on-premises router supports BGP Equal-Cost Multi-Path (ECMP), you can use this feature to load balance inbound services traffic across multiple nodes, as Cilium will by default advertise the Service VIPs from all hybrid nodes within your EKS cluster.

Before we proceed, we’ll remove the L2 Load Balancer service for the Nginx Pod and clean the L2 announcement configuration from previous steps.

$ kubectl delete -f nginx-lb-l2-cilium.yaml 
$ kubectl delete ciliumloadbalancerippools.cilium.io cilium-lb-l2pool
$ kubectl delete ciliuml2announcementpolicies.cilium.io cilium-lb-l2 

Load Balancer integration with BGP

To enable Load Balancer L3 advertisement, we will first edit the existing BGP advertisement configuration by adding the following Service IP advertisement configurations, so Cilium’s BGP control plane will advertise LoadBalancer external IPs (VIPs) with defined matching conditions.

$ cat cilium-bgp-advertisement.yaml 
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPAdvertisement
metadata:
  name: bgp-advertisements
  labels:
    advertise: bgp
spec:
  advertisements:
    - advertisementType: "PodCIDR"
    - advertisementType: "Service" ### add "Service" for BGP advertisement type
      service:
        addresses:
          - LoadBalancerIP
      selector:
        matchExpressions:
          - { key: lbpool, operator: In, values: [ dev, test, prod ] }
          
$ kubectl apply -f cilium-bgp-advertisement.yaml 

We then create a Load Balancer IP pool with a reserved and routable CIDR block and a matching label.

$ cat cilium-lb-l3pool-dev.yaml 
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
  name: cilium-lb-l3pool-dev
spec:
  blocks:
    - cidr: 192.168.48.0/26
  serviceSelector:
    matchLabels:
      lbpool: dev
      
$ kubectl apply -f cilium-lb-l3pool-dev.yaml       

To test this, we’ll use the same Ngnix pod from previous step but this time we’ll expose it using a Load Balancer service with L3 announcement (BGP).

Create a LoadBalancer service with a matching label, and this will trigger Cilium to assign a /32 Service VIP (from LB IP pool), which will then get advertised to the upstream router via BGP. Similar to the L2 announcement, this time we need to specify the Load Balancer Class to io.cilium/bgp-control-plane.

$ cat nginx-lb-bgp-dev.yaml 
apiVersion: v1
kind: Service
metadata:
  name: nginx-lb-bgp
  labels:
    lbpool: dev
spec:
  type: LoadBalancer
  loadBalancerClass: io.cilium/bgp-control-plane
  selector:
    app: nginx
  ports:
    - port: 80
      targetPort: 80

$ kubectl apply -f nginx-lb-bgp-dev.yaml

We can see a Load Balancer service is deployed with an external IP (VIP) assigned from the L3 LB IP pool.

$ kubectl get svc | grep nginx
nginx-lb-bgp   LoadBalancer   172.16.140.126   192.168.48.0   80:32315/TCP   5m57s

This /32 Load Balancer VIP is automatically advertised by Cilium to the upstream router via the BGP control-plane. Also, since our on-premises router (VyOS) supports ECMP, we have both nodes listed as valid next-hop.

vyos@VyOS-RT01:~$ show ip bgp 
BGP table version is 106, local router ID is 192.168.200.254, vrf id 0
Default local pref 100, local AS 65432
Status codes:  s suppressed, d damped, h history, u unsorted, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>i 192.168.32.0/26  192.168.200.11                100      0 i
 *>i 192.168.32.64/26 192.168.200.12                100      0 i
 *>i 192.168.48.0/32  192.168.200.11                100      0 i
 *=i                  192.168.200.12                100      0 i

From the on-premises network, we can access the Nginx service through the Load Balancer’s VIP via BGP routing.

$ curl http://192.168.48.0
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
[...]

Ingress integration with BGP

Cilium provides built-in support for Kubernetes Ingress resource definition, with an ingressClassName of cilium. Cilium’s Ingress can be exposed with multiple options, such as a Loadbalancer or NodePort service, or even via the local host/node network - you can read more details at here.

In this example, I’ll demonstrate how to deploy Cilium Ingress using a Load Balancer, since we have already enabled BGP announcement for Load Balancer services on our EKS cluster with hybrid nodes.

To do so, upgrade Cilium configuration to enable Ingress Controller support. As a pre-requisite, you must enable NodePort as documented here.

$ helm upgrade cilium cilium/cilium \
--version ${CILIUM_VERSION} \
--namespace kube-system \
--reuse-values \
--set nodePort.enabled=true \
--set ingressController.enabled=true \
--set ingressController.loadbalancerMode=dedicated

$ kubectl rollout restart daemonset -n kube-system cilium
$ kubectl rollout restart -n kube-system deployment cilium-operator

We’ll create a new BGP advertisement policy for the Load Balancer services deployed by the Cilium Ingress Controller. So the LB VIPs for the Cilium Ingress resources will be automatically advertised via BGP.

$ cat cilium-bgp-advertisement-ingress.yaml 
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPAdvertisement
metadata:
  name: bgp-advertisements-ingress
  labels:
    advertise: bgp
spec:
  advertisements:
    - advertisementType: "Service"
      service:
        addresses:
          - LoadBalancerIP
      selector:
        matchExpressions:
          - { key: cilium.io/ingress, operator: In, values: [ 'true' ] }
          
$ kubectl apply -f cilium-bgp-advertisement-ingress.yaml

Note the Load Balancer services deployed by Cilium Ingress Controller will always be tagged with a label cilium.io/ingress: "true" so we'll use it as a matching condition for BGP advertisement.

We’ll then create a dedicated LB IP pool for Ingress-created Load Balancer services with the corresponding label.

$ cat cilium-lb-l3pool-ingress.yaml 
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
  name: cilium-lb-l3pool-ingress
spec:
  blocks:
    - cidr: 192.168.48.128/26
  serviceSelector:
    matchLabels:
      cilium.io/ingress: "true"

$ kubectl apply -f cilium-lb-l3pool-ingress.yaml

Now let's test this with a popular Kubernetes demo app Online Boutique.

$ git clone --depth 1 --branch v0 https://github.com/GoogleCloudPlatform/microservices-demo.git

Before we deploy the demo app, we'll make a quick update to the deployment manifest and remove the frontend-external (Load Balancer) service, as we'll be deploying an Ingress resource instead to expose the frontend service.

$ cd microservices-demo/
$ cat release/kubernetes-manifests.yaml

[...]
---
#apiVersion: v1
#kind: Service
#metadata:
#  name: frontend-external
#  labels:
#    app: frontend
#spec:
#  type: LoadBalancer
#  selector:
#    app: frontend
#  ports:
#  - name: http
#    port: 80
#    targetPort: 8080

$ kubectl apply -f ./release/kubernetes-manifests.yaml

Next, we create an Ingress resource for the frontend service - make sure to specify the Ingress Class Name with Cilium.

$ cat frontend-ingress-cilium.yaml 
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: frontend-ingress
spec:
  ingressClassName: cilium
  rules:
  - host: shop.vxlan.co  
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend
            port:
              number: 80
              
$ kubectl apply -f frontend-ingress-cilium.yaml

We can verify that Cilium has deployed an Ingress resource with the hostname as configured above. In addition, the Ingress is exposed via a Load Balancer service, named “cilium-ingress-<service-name>”, which is automatically created via the Cilium Ingress Controller.

$ kubectl get ingress
NAME               CLASS    HOSTS           ADDRESS          PORTS   AGE
frontend-ingress   cilium   shop.vxlan.co   192.168.48.128   80      13m

$ kubectl get svc | grep frontend-ingress
cilium-ingress-frontend-ingress   LoadBalancer   172.16.54.25     192.168.48.128   80:31321/TCP,443:31197/TCP   13m

Additionally, the Ingress-created Load Balancer VIP (192.168.48.128) is also advertised to the on-premises router via BGP.

vyos@VyOS-RT01:~$ show ip bgp 
BGP table version is 116, local router ID is 192.168.200.254, vrf id 0
Default local pref 100, local AS 65432
Status codes:  s suppressed, d damped, h history, u unsorted, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>i 192.168.32.0/26  192.168.200.11                100      0 i
 *>i 192.168.32.64/26 192.168.200.12                100      0 i
 *>i 192.168.48.0/32  192.168.200.11                100      0 i
 *=i                  192.168.200.12                100      0 i
 *>i 192.168.48.128/32
                    192.168.200.11                100      0 i
 *=i                  192.168.200.12                100      0 i

After creating a DNS record for the hostname, we can access the frontend service for the demo app hosted on your EKS hybrid nodes via the Cilium Ingress.

Enter image description here

 

Clean Up

To remove the demo resources created in the above steps, use the following commands.

$ kubectl delete pod nginx 
$ kubectl delete svc nginx-lb
$ kubectl delete -f https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.12.0/docs/examples/2048/2048_full.yaml

If the demo environment is no longer required, delete the EKS cluster to avoid incurring charges.

$ eksctl delete cluster --name <my-cluster> --region <aws-region-code>

 

Conclusion

In this post, I have walked through detailed steps for deploying Cilium Networking onto Amazon EKS cluster with hybrid nodes. I have demonstrated different cluster networking configurations with both static and BGP routing, as well as load balancing options in both L2 and L3 mode for various use cases and scenarios.

To learn more, please refer to the following resources: