AWS re:Invent 2024 - Networking strategies for Kubernetes

7 minute read
Content level: Expert
0

This blog post summarizes the AWS re:Invent 2024 session "Networking strategies for Kubernetes" presented by Sai Vennam (Principal Solutions Architect, AWS) and Federica Ciuffo (Sr Containers Specialist Solutions Architect, AWS). We'll explore practical approaches to Kubernetes networking challenges, from EKS Auto Mode to service meshes and Gateway API, demonstrated through a retail application scenario.

At AWS re:Invent 2024, Sai Vennam (Principal Solutions Architect, AWS) and Federica Ciuffo (Sr Containers Specialist Solutions Architect, AWS) presented a session on Kubernetes networking strategies. They used a role-play format with a fictional retail store application to show how organizations can address common networking challenges. Federica played a cloud architect defining requirements, while Sai implemented solutions as the cluster operator.

Their session covered both basic concepts and advanced scenarios, making it useful for beginners and experienced Kubernetes practitioners alike. You'll learn about the key topics they discussed and the practical solutions they demonstrated.

Simplifying Kubernetes Cluster Operations with Amazon EKS

The speakers began by addressing a common pain point: the operational complexity of managing Kubernetes networking components. Traditionally, platform teams have had to manage multiple components like kube-proxy, CoreDNS, and CNI plugins, often leading to confusion during upgrades and maintenance windows.

To simplify this, they introduced Amazon EKS Auto Mode, a significant enhancement to Amazon EKS announced at re:Invent 2024. While Amazon EKS has always managed the Kubernetes control plane, Auto Mode extends management to critical data plane components:

"Amazon EKS is helping customers manage parts of the data plane as well", explained Sai. "Essentially, with Auto Mode, customers can have Amazon EKS clusters that are ready for production use cases out of the box".

With Auto Mode, AWS manages the full lifecycle of:

  • VPC CNI: This assigns Amazon VPC IPs directly to pods, offering seamless integration with existing AWS networking features like security groups and VPC flow logs
  • CoreDNS: Handles cluster DNS resolution
  • kube-proxy: Manages network rules for pod-to-pod communication
  • AWS Load Balancer Controller: Automatically provisions AWS load balancers based on Kubernetes resources

This approach eliminates the need to manually upgrade these components when the control plane is upgraded, significantly reducing the operational overhead on platform teams.

Exposing Applications to the Outside World

The speakers then addressed how to expose Kubernetes applications externally—a requirement for any customer-facing service. They demonstrated how the AWS Load Balancer Controller streamlines this process by automatically provisioning and configuring AWS load balancers based on Kubernetes resources.

Federica explained two primary patterns:

  • Ingress Resources: When you create an ingress with the AWS Load Balancer Controller, it provisions an Application Load Balancer (ALB) for Layer 7 routing, that provides path-based routing and advanced features
  • Service Resources (type LoadBalancer): These provision Network Load Balancers (NLBs) for Layer 4 load balancing, ideal for non-HTTP protocols

"The Layer 7 processing happens at a load balancer layer", Federica noted. "This means capabilities like TLS termination can be offloaded from your application teams to the load balancer, and then it can also be managed by AWS Certificate Manager".

In a practical demonstration, Sai showed how a simple ingress resource with appropriate annotations creates an internet-facing ALB that routes traffic to a retail application. This approach integrates deeply with AWS's networking stack, helping you use existing AWS capabilities like certificate management and security controls.

Enhancing Network Resiliency with Amazon Recovery Controller

Moving beyond basic connectivity, the speakers addressed how to improve resilience during infrastructure failures. They introduced Amazon Recovery Controller (ARC) integration with Amazon EKS, which addresses a critical gap in standard Kubernetes behavior.

"The problem is that there is a delay", Federica explained when discussing traditional failure detection. "We need to wait for those health checks to fail, and then the targets will be deregistered".

ARC enhances resilience by:

  • Immediately signaling AZ failures through AWS API
  • Automatically deregistering endpoints in affected availability zones without waiting for health checks to fail
  • Supporting disaster recovery testing without actual AZ failures

This integration substantially reduces recovery time during incidents and gives teams the ability to proactively test recovery scenarios—a critical capability for mission-critical applications.

Service Meshes and Istio Ambient Mesh

The most technical portion of the presentation addressed service meshes—an increasingly popular approach for managing service-to-service communication. The speakers acknowledged that while service meshes provide powerful security, observability, and traffic management capabilities, traditional sidecar-based implementations introduce significant operational challenges.

"As the platform team, I've been told all service-to-service communication has to be encrypted, has to go over mutual TLS", Sai explained, highlighting the security requirements many organizations face.

To address both the need for service mesh capabilities and the operational challenges, they demonstrated Istio Ambient Mesh, which provides:

  • Zero-Downtime Implementation: Unlike traditional sidecars, Ambient Mesh implements mutual TLS without requiring pod restarts
  • Layered Adoption: You can start with Layer 4 security (mutual TLS) and gradually adopt Layer 7 capabilities as needed

Sai demonstrated how configuring Ambient Mesh required just labeling a namespace:

kubectl label namespace default istio.io/dataplane-mode=ambient

With this simple command, pod-to-pod traffic immediately switched to encrypted mutual TLS without any pod restarts—solving one of the biggest adoption barriers for service meshes.

For deeper observability and Layer 7 capabilities, Sai showed how adding a waypoint proxy provided detailed traffic visualization in Kiali (an Istio observability dashboard) and supported chaos engineering experiments by injecting faults to test application resilience.

Perhaps most importantly, the speakers demonstrated Ambient Mesh's flexibility with mixed workloads. For their catalog service that used MySQL protocol (incompatible with the ztunnel proxy), they showed how a traditional sidecar could be used just for that specific workload while the rest of the application used the more efficient ambient model.

Kubernetes Gateway API and Multi-Cluster Deployments

In the final segment, the speakers looked toward the future of Kubernetes networking with the Kubernetes Gateway API—the evolution of the Ingress API.

"The Kubernetes ecosystem is not adding new features to ingresses", Federica noted. "The evolution, though, of ingress, that is the Kubernetes Gateway API, has all those capabilities".

They explained how the Gateway API offers:

  • Richer feature set than the traditional Ingress API
  • A unified API for ingress, service-to-service, and egress traffic
  • Better role separation between infrastructure providers and application developers
  • A standard that the entire Kubernetes ecosystem is rallying around

For complex scenarios like blue-green deployments across clusters, they presented two approaches:

  1. Service Mesh Multi-Cluster: Powerful but complex, requiring careful consideration of control plane setup, cross-cluster permissions, and trust boundaries
  2. Gateway API with Multi-Cluster Services API: A more Kubernetes-native approach using ServiceExport and ServiceImport resources

They also explained how you can implement the Gateway API using Amazon VPC Lattice, a Layer 7 network service.

"Amazon VPC Lattice is a network service just like AWS Transit Gateway, just like VPC Peering, but it operates at a different layer, the Layer 7 of the OSI model", Federica explained.

With this service, you can connect applications across VPCs and accounts and even route traffic between Kubernetes clusters and other AWS services like Amazon EC2, AWS Lambda, or Amazon ECS—making it ideal for gradual migrations and hybrid architectures.

Conclusion

The session demonstrated a thoughtful progression from basic Kubernetes networking with EKS Auto Mode to advanced multi-cluster deployments using the Gateway API and Amazon VPC Lattice. At each step, the speakers balanced theoretical concepts with practical demonstrations, helping you understand not just the "what" but the "why" and "how" of modern Kubernetes networking.

If you want to deepen your knowledge, the speakers recommended the Amazon EKS Workshop for hands-on labs covering networking scenarios, the Amazon EKS Best Practices Guide now integrated into AWS documentation, and reaching out to AWS account managers to request guided Amazon EKS workshops. To gain the full benefit of this valuable session, you can watch the complete presentation on the official AWS YouTube channel, where additional details and demonstrations are available.