AWS at KubeCon + CloudNativeCon Europe 2024

23 minute read
Content level: Foundational
3

šŸ“£ šŸ“… Mark your calendars! We're headed to Paris and want to see you there

šŸš€Join AWS speakers at KubeCon Europe as they dive into the latest open source innovations innovations that make AWS the best place for customers to build and run open source software in the cloud. AWS conference speakers will be talking about Karpenter, Argo CD apps, the integration of AI in the CloudNative world, multi-tenant scalable Prometheus with Cortex, eBPF, Cloud Native CAKES stack for Zero Trust, chaos engineering, Kubernetes controllers, OpenTelemetry, and zonal outages. Donā€™t miss this opportunity to enhance your knowledge and connect with AWS experts. Add these sessions to your schedule and meet us there. ƀ bientĆ“t Ć  Paris!

You can also visit us at the AWS booth in the Solutions Showcase for AWS swag šŸŽ and hands-on demos šŸ’» covering cost optimization, platform and GitOps strategy, and the latest in data and AI/ML.

KubeConEurope2024

Tuesday, March 19

ArgoCon

Harnessing Karpenter: Transforming Kubernetes Clusters with Argo Workflows - Carlos Santana & Raj Saha, AWS

Discover the future of Kubernetes cluster scaling with Karpenter, the latest and most rapid Kubernetes-native cluster autoscaler, now part of the CNCF ecosystem. While Cluster Autoscaler remains widely used among Kubernetes enthusiasts, this session introduces a groundbreaking approach to transitioning your worker nodes and pods to Karpenter with ease and efficiency. Join us for an interactive demonstration where we'll explore the powerful synergy of Argo Workflows and Karpenter. You'll learn how to seamlessly migrate your Kubernetes resources, capitalizing on Argo Workflows' flexibility and its unique capability to execute CI pipelines within the cluster. This not only enhances your security posture but also adeptly manages challenges such as intermediate failures and time-intensive tasks in large-scale node roll-overs. Click here to add this session to your event schedule!

Adobe/AWS: Key Takeaways from Scaling Adobe's CI/CD Solution to Support >50K Argo CD Apps - Andrew Lee, AWS & Vikram Sethi, Adobe

Adobe Flex is a CI/CD solution started inside Adobe in 2022 with widespread adoption in Q2 2023. The broader adoption of the Flex solution quickly revealed some stability and scalability challenges with Flex struggling at around 1500 Argo CD applications. This, along with other scalability challenges faced by other AWS customers, led to a joint community effort between AWS, Adobe, Akuity, and other community members to also kick off the #argo-sig-scalability working group to collaborate on common patterns and practices around using Argo at scale. Fast forward to today, Adobe has been able to successfully scale Argo CD to run more than 9000 Argo CD applications comfortably and can easily support 5X more by adding scale on demand. In this talk, we will talk about how Adobe partnered with AWS to solve the stability and scalability challenges in getting to current scale and design a linearly scalable multi-tenant sharding architecture to easily add 5X more scale on demand, and beyond. Click here to add this session to your event schedule!

Cloud Native AI Day

Panel: Beyond the Clouds: Charting the Course for AI in the CloudNative World - Rajas Kakodkar, VMware; Ricardo Aravena, TruEra; Alolita Sharma, Apple; Alex Jones, AWS; Cathy Zhang, Intel

With Kubernetes being the de facto choice of orchestration platform for AI, have you wondered how AI can benefit from CloudNative technologies? Join this discussion, to understand it beyond the horizon of CloudNative and AI integration. This discussion, with maintainers of projects like k8sgpt and industry leaders of CloudNative for AI space, will address - How CloudNative principles like OCI registries and distributed architecture can benefit AI - How telemetry and observability can be leveraged to profile an AI app - The importance and need for collaboration between MLops practitioners and cloud-native technologists - How Software Supply Chain best practices can be incorporated in AI model pipelines - How GPU/CPU resources can be better scheduled to render higher AI performance - How GPU partition and sharing can help in increased resource utilization rate and reduced AI cost The audience will know how to contribute to the effort of CloudNative and AI benefiting from each other! Click here to add this session to your event schedule!

Wednesday, March 20

AWS booth at the Solutions Showcase

You can read more about each demo at the bottom of the page.

TimeCost Optimization stationData & AI/ML stationPlatform & GitOps Strategy station
10:45Building high-performance apps & controlling costs with CNCF projects (Karpenter and KEDA)Distributed Training on EKS - Tools, Tips, Tricks & Best PracticesAmazon EKS multi-cluster topologies: The GitOps Bridge pattern
12:00Cost attribution for shared Amazon EKS clusters is easier than everNotebooks as a serviceTools and best practices for building your Internal Developer Platform (IDP)
13:30Managed Open Source & Native Observability with AWSModernize data processing with Spark on Amazon EKSChaos Engineering with AWS Fault Injection Simulator and CNCF projects
15:00Optimize cost and improve scalability with Karpenter on KubernetesOptimizing Kubernetes for High-Performance Workloads on AWS with NVIDIAGitOps: Unlocking the power of Kubernetes cluster management
16:30Building high-performance apps & controlling costs with CNCF projects (Karpenter and KEDA)Build end-to-end ML platforms that accelerate innovation with the JARK stackKubernetes threat detection in multiple layers
18:00Cost optimization for EKS with Karpenter, Spot and GravitonGenerative AI on Amazon EKS: text-to-image generation with AWS InferentiaAmazon EKS multi-cluster topologies: The GitOps Bridge pattern

KubeCon

Cortex Intro: Multi-Tenant Scalable Prometheus - Ben Ye

Cortex provides horizontally scalable, highly available, multi-tenant, long term storage for Prometheus. In this talk, Ben will do an introduction of Cortex architecture and project status. He will also walk through those new features added to Cortex throughout 2023 and how to utilize them efficiently in production. Click here to add this session to your event schedule!

Troubleshooting Hidden Performance and Costs in Network Traffic Across Multiple AZs with eBPF - Shahar Azulay, Groundcover & Nirmal Mehta

Spanning Kubernetes Clusters across multiple Availability Zones is common when optimizing for resiliency but introduces challenges like network performance and costs when workloads communicate with each other across AZs. AZs are designed for low roundtrip latency between different AZs in the same region, yet in a modern microservices application a single request can trigger multiple interactions crossing an AZ boundary over and over again, through several network layers including Application Load Balancers and Kubernetes proxies. This can create an aggregated effect which is usually hard to detect and troubleshoot on both latency and performance, but also cost since data transfer charges apply for cross-AZ communication. Enhanced Berkeley Packet Filter (eBPF) offers unparalleled visibility into the network stack of a Kubernetes cluster. It can be used to unravel concealed performance bottlenecks and understand the nuanced cost implications of network requests cross AZs in Kubernetes. Click here to add this session to your event schedule!

Poster Session: Serve CAKES for Your Developers: Introducing the Cloud Native CAKES Stack for Zero Trust! - Lin Sun, solo.io & Davanum Srinivas

Who can resist the allure of cakes? In this session, Lin and Dims (maintainers from Istio and Kubernetes) will unveil the CAKES stackā€”a zero trust composition using five widely adopted CNCF graduated projects: - Cilium (C): An innovative CNI based on evolutionary eBPF. - Istio Ambient ( A): The most deployed service mesh in production with the new sidecar-less data plane choice. Kubernetes (K): The de facto platform for managing containerized workloads and services - Envoy (E): A high-performance proxy for API gateways. - Spire (S): A production-ready SPIFFE implementation to attest workload identities. They will delve into the technical requirements for establishing an effective zero trust architecture and showcase through live demo how the combining of these projects results in a powerful, open, and extensible platform, enabling developers to secure their cloud native applications with zero trust principle while ensuring consistency and reliability. Click here to add this session to your event schedule!

Tutorial: Chaos Unleashed Workshop: Embrace the Chaos in Real-Time! - Nele Lea Uhlemann, Fiberplane & Guillermo Ruiz, AWS

Get ready for a hands-on chaos engineering workshop that takes interactive learning to the next level! Picture yourself immersed in a chaotic environment, guided by a captivating narrative, and empowered to shape both the chaos and its resolution. G and Nele will kick off with a ChaosToolkit-driven simulation of pod and network failures in a production Kubernetes environment. Attendees, using a voting app, will collaboratively troubleshoot the system, exploring insights through examining logs, metrics, traces, and terminal outputs. Essential tools, such as Prometheus, alongside visualization in Perse, will play a key role in the workshop. Recognizing that humans are integral to such scenarios, dice rolling will introduce some unplanned humanistic chaos. This workshop combines the excitement of a live-action game with the principles of chaos engineering, delivering a unique and unforgettable experience. Join the open chaos and find out if we can escape from it in 90 minutes. Click here to add this session to your event schedule!

Future of Intelligent Cluster Ops: LLM-Azing Kubernetes Controllers - Rajas Kakodkar, VMware & Amine Hilaly, AWS

As a Kubernetes operator, you must have spent countless hours upgrading clusters, deploying complex applications and troubleshooting issues. Have you ever wondered if you could automate this and literally speak to your cluster by asking - Is it safe to upgrade to v1.29? - Why isnā€™t Node X Ready? Join this session by Rajas and Amine to discover how AI can empower cluster operations with K8s controllers backed by LLMs. Discover the stages of data processing, fine-tuning LLMs and integrating them with K8s controllers and CRDs. And witness the addition of Speech Recognition to the K8s controller to operate clusters. To unravel the myths of AI hype, there will be a live demo to ā€œtalkā€ to K8s controller powered by LLM for - Auditing and upgrading clusters - Simulating Chaos scenarios - Scanning clusters for CVEs - Observability of cluster health The audience will get to know how domain knowledge helps improve AI model accuracy to ensure that it follows data ethics and security principles. Click here to add this session to your event schedule!

Thursday, March 21

AWS booth at the Solutions Showcase

You can read more about each demo at the bottom of the page.

TimeCost Optimization stationData & AI/ML stationPlatform & GitOps Strategy station
10:30Cost attribution for shared Amazon EKS clusters is easier than everDistributed Training on EKS - Tools, Tips, Tricks & Best PracticesChaos Engineering with AWS Fault Injection Simulator and CNCF projects
12:00Managed Open Source & Native Observability with AWSNotebooks as a serviceGitOps: Unlocking the power of Kubernetes cluster management
13:30Optimize cost and improve scalability with Karpenter on KubernetesGenerative AI workload observability for Amazon EKS with Container InsightsKubernetes threat detection in multiple layers
15:00Cost optimization for EKS with Karpenter, Spot and GravitonBuild generative AI applications on EKS with S3, EFS, and the open source CSI driverAmazon EKS multi-cluster topologies: The GitOps Bridge pattern

KubeCon

Tutorial: Exploring the Power of Distributed Tracing with OpenTelemetry on Kubernetes - Pavol Loffay & Benedikt Bongartz, Red Hat; Matej Gera, Coralogix; Anthony Mirabella, AWS; Anusha Reddy Narapureddy, Apple

Rolling out an observability solution is not a straightforward problem. There are many solutions and the final architecture can impact the effectiveness, robustness, and long-term maintenance aspects of the architecture. In this comprehensive tutorial, we will deploy an end-to-end distributed tracing stack on Kubernetes using the OpenTelemetry project. The tutorial will cover both manual and auto-instrumentation, extending the auto-instrumentation, collecting data with the OpenTelemetry collector and performing transformation on spans using OTTL, tail-based sampling, deriving metrics from traces, tracing with proxies/service meshes and collecting traces from Kubernetes infrastructure. After this session, the audience will be able to understand and use OpenTelemetry API/SDK, auto-instrumentation, collector, and operator to roll out a working distributed tracing stack on Kubernetes. Click here to add this session to your event schedule!

Intro + Deep Dive: Kubernetes SIG Scalability - Wojciech Tyczyński, Google & Shyam Jeedigunta, Amazon Web Services

This session will cover different efforts that SIG Scalability is involved in: defining what scalability means for Kubernetes, driving performance improvements, maintaining infrastructure for scalability testing, guarding Kubernetes against performance regressions. In addition to overall overview, the most recent achievement and challenges are always the top focus for the presentation. Cooperation with other SIGs is an important aspect of the presentation as many improvements driven from the SIG are in fact owned by other SIGs. Time for Q&A will be reserved at the end of the session to understand how the SIG can better engage with the community as well as to allow the audience to provide the input about the roadmap. Click here to add this session to your event schedule!

Zonal Outage Operational Stories - Jyoti Ranjan Mahapatra & Shyam Jeedigunta, Amazon Web Services

Most datacenters have a notion of ā€œavailability zoneā€ as a failure domain. Correlated failures are expected in a single failure domain. Kubernetes cluster administrators deploy Kubernetes control plane, worker nodes, and pods, in a topological spread that can tolerate a single fault domain failure. Such setups achieve high availability and gracefully handle common zonal failures ā€” network partitions, power-loss, reboot, bad software deployments, and so forth. This talk walks through numerous real world zonal outages, from a spectrum of partial to full outage, and the behavior of Kubernetes components in those situations. The speakers operate a large fleet of Kubernetes control plane in Amazon Web Services; they will share stories of zonal outages and improvements that helped achieve greater resiliency for thousands of clusters. Click here to add this session to your event schedule!

Kubernetes Maintainers Read Mean Comments - Tim Hockin, Google & Davanum Srinivas, Amazon Web Services

Being a maintainer of a large open-source project can sometimes be a thankless job. While most of our users are wonderful, sometimes things get heated, and occasionally people say something that just goes too far. Thankfully, we maintainers have each other to lean on, and good senses of humor. This session is a reminder that maintainers are just normal people, often doing this work out of passion. We share these nuggets to vent a little and to poke fun at ourselves as well. We love our users and the community who use our work. They are the most creative people and often build things we did not even think possible, but sometimes it gets to be a little too much ... and this is our escape valve. Click here to add this session to your event schedule!

Kubernetes SIG Architecture Intro and Updates - John Belamaric, Google & Davanum Srinivas, AWS

SIG Architecture maintains and evolves the design principles of Kubernetes, and provides a consistent body of expertise necessary to ensure architectural consistency over time. The SIG takes care of evolution of conformance definitions, API definitions/conventions, deprecation policy, design principles, and other cross-cutting concerns. In this talk, we will provide an introduction to SIG architecture, including its role and the various subprojects that support its activities. Additionally, we will provide a community update on the status of those efforts. Click here to add this session to your event schedule!

SIG Autoscaling Updates and Feature Highlights - Guy Templeton, Skyscanner; Jonathon Innis, AWS; Maciek Pytel, Google

Since adoption by SIG Autoscaling in the lead-up to Kubecon North America 2023, Karpenter has continued to develop its roadmap and integrations, allowing even more cluster operators to make use of it. Come hear the latest on the new features we've delivered and what we're planning for the future. If you're interested in the future of the project, want to get involved yourself and help move the project forward, or just have feedback on your experience, come along! Click here to add this session to your event schedule!

Friday, March 22

AWS booth at the Solutions Showcase

You can read more about each demo at the bottom of the page.

TimeCost Optimization stationData & AI/ML stationPlatform & GitOps Strategy station
10:30Building high-performance apps & controlling costs with CNCF projects (Karpenter and KEDA)Build end-to-end ML platforms that accelerate innovation with the JARK stackKubernetes on your Mac? Yes! Using Finch with KinD
12:00Cost attribution for shared Amazon EKS clusters is easier than everGenerative AI on Amazon EKS: text-to-image generation with AWS InferentiaChaos Engineering with AWS Fault Injection Simulator and CNCF projects

KubeCon

Keynote: Cloud Native in its Next Decade - Davanum Srinivas, Principal Engineer, AWS & Lin Sun, Head of Open Source, solo.io

When we started CNCF in 2015 to help advance container technology, Kubernetes was the seeding technology to provide a de facto container orchestration platform for all cloud native applications. Almost a decade later, the community has exploded with 180+ open source projects building on top of cloud native technologies. Looking ahead, what challenges will we have in the next decade? They will be vastly different for our users and contributors from today. Let us review some of the key CNCF projects today and lay out some possible avenues for where cloud native is going for the next decade, AI, sustainability, edge computing, security, service mesh, web assembly and more. Right or wrong, weā€™ll find out at KubeCon 2034! Click here to add this session to your event schedule!

AWS booth demos

Cost Optimization

Optimize cost and improve scalability with Karpenter on Kubernetes

Karpenter is an open source auto scaling solution that simplifies Kubernetes by launching the right compute resources for the application workloads while responding quickly and automatically to changes in application load and resource requirements. In this session, learn how Karpenter can lower compute costs by replacing expensive nodes with cheaper alternatives, removing under-utilized nodes, and consolidating workloads into more efficient compute resources. Hear how Karpenter can significantly improve efficiency and the cost of running workloads on a given cluster. It requires very little to no configuration, providing a better operational experience for cluster administrators.

Cost attribution for shared Amazon EKS clusters is easier than ever

Kubecost is widely used by customers running their workloads on Amazon EKS. It makes cost allocation and attribution simple, which allows you to easily charge shared resources across teams. In this demo, see how you can configure Amazon Managed Service for Prometheus to store Kubecost metrics and monitor costs across multiple Amazon EKS clusters at no cost.

Building high-performance apps & controlling costs with CNCF projects (Karpenter and KEDA)

Modern applications are composed of diverse design patterns, such as event-driven architectures, microservices, and data on Kubernetes, among others. Due to the unique nature of these applications, they require scaling based on metrics beyond the traditional CPU and memory usage. In this session, learn how to use CNCF Karpenter (part of Kubernetes Autoscaling SIG) and CNCF KEDA to scale your application from zero to (near) infinity and back to zero, ensuring performance meets the desired SLOs while considering cost optimization.

Managed Open Source & Native Observability with AWS

Attendees will interact with a demo of the AWS Observability Accelerator, a set of opinionated Terraform/CDK modules designed to streamline Amazon EKS cluster observability with AWS-managed open source and native services. We will walk attendees through Amazon CloudWatch Container Insights which now delivers enhanced observability for Amazon EKS with detailed health and performance metrics, including container level performance metrics, Kube-state metrics, and control plane metrics for faster problem isolation and troubleshooting. We will showcase CloudWatch Application Signals that provide application and infrastructure monitoring for Java EKS workloads on Amazon EKS.

Data and AI/ML on AWS

Modernize data processing with Spark on Amazon EKS

Spark has become the go-to data processing tool because of its speed, scalability, and community support. Since Spark 2.3 introduced support for Kubernetes, users have been moving their Spark jobs to Kubernetes to create more reliable, scalable, resource-efficient data processing platforms. This session demonstrates how to deploy and scale Spark on an Amazon EKS cluster using Spark Operator to manage Spark jobs, Apache YuniKorn for batch scheduling, and Karpenter for autoscaling to create a reliable, scalable, cost-efficient data processing environment on Kubernetes.

Build end-to-end ML platforms that accelerate innovation with the JARK stack

Generative AI is transforming the way businesses function and is accelerating the pace of innovation, and organizations need to build the platforms that allow their teams to innovate. Gen AI technology involves tuning and deploying Large Language Models (LLM), but there are a multitude of tools that are needed to support this process end-to-end. Youā€™ll need notebooks so developers and data scientists can experiment to build models, access to open-source ML models for training, orchestration tools to manage the complex workflows for preparing data and training models, and tools to deploy and serve the models in production. This demo shows an end-to-end ML stack on Kubernetes using Amazon EKS, using JupyterHub for Notebooks, Argo Workflows for job orchestration, and Ray to training and serve ML models, a combination we call the JARK stack. It also demonstrates how to use this stack to deploy a Stable Diffusion test-to-image model.

Distributed Training on EKS - Tools, Tips, Tricks & Best Practices

As your data and model size grow, so does the importance of distributed model training. This demo will explore how Amazon EKS, along with Pytorch and Ray, enables efficient distributed training by leveraging cloud elasticity. We'll cover the integration of these tools with EKS and share best practices for maximizing your EKS clusters for distributed training, offering a deep dive into harnessing EKSā€™s powerful ML capabilities for handling extensive datasets swiftly and effectively.

Generative AI on Amazon EKS: text-to-image generation with AWS Inferentia

Using a dedicated silicon layer like AWS Trainium and AWS Inferentia to streamline the inferencing process on Kubernetes marks a significant stride in the evolution of generative AI. Optimizing this crucial computational phase can help pave the way for more efficient and advanced generative AI developments. In this demo, discover how Amazon EKS powers real-time, cost-effective deployment of high-quality generative AI models for real-time inferencing using Trainium and Inferentia. Explore how to deploy the Stable Diffusion XL base model on Amazon EKS using Ray Serve. Learn how to harness the power of Trn1/Inf2 instances optimized for LLMs, and witness the fusion of AI and Kubernetes to create efficient, self-managed ML platforms.

Generative AI workload observability for Amazon EKS with Container Insights

This session walks through how you can use Amazon CloudWatch Container Insights to monitor your NVIDIA GPU health on Amazon EKS to optimize AI/ML workloads. This session walks through how you can use Amazon CloudWatch Container Insights to monitor your NVIDIA GPU health on Amazon EKS to optimize AI/ML workloads. Interact with Container Insights enhanced observability for Amazon EKS to monitor detailed health and performance metrics while operating with opinionated observability for faster problem isolation and troubleshooting.

Notebooks as a service

Jupyter notebooks have become the go-to development environment for data scientists, machine learning engineers, and a diverse array of professionals. In this session, learn how Jupyter notebooks seamlessly blend code, visualizations, and explanatory text together, creating an interactive narrative of data exploration, preparation, and ML model development.

Optimizing Kubernetes for High-Performance Workloads on AWS with NVIDIA

Explore how NVIDIA and AWS together ensure high performance for demanding applications in Kubernetes environments. This session delves into strategies for allocating compute resources effectively leveraging NVIDIA's Kubernetes enhancements and AWS infrastructure to achieve optimal performance and low-latency outcomes.

GitOps and Platform Strategy

Tools and best practices for building your Internal Developer Platform (IDP)

We will discuss how to get started with deploying the IDP reference implementation easily using IDPBuilder. IDPBuilder is an open source tool developed by the Cloud Native Operational Excellence (CNOE) group that allows you to express your own IDP with Docker being the sole dependency. We will demonstrate how IDPBuilder accelerates the development and delivery of platform capabilities across diverse environments. We will also show how one can run this tool to automatically validate and test platform capabilities in CI jobs without using brittle scripts.

GitOps: Unlocking the power of Kubernetes cluster management

Join this session to discover the ins and outs of provisioning and configuring Kubernetes clusters using the cutting-edge GitOps approach. Witness the seamless integration as we demonstrate how to build an event-driven workflow to automatically register provisioned clusters to the central GitOps platform, empowering developers to effortlessly deploy their applications without the hassle of manual configuration. Through live demonstration, see how popular open source tools like Argo CD, Argo Events, Argo Workflows, and Crossplane seamlessly work together to create a powerful and efficient provisioning pipeline.

Amazon EKS multi-cluster topologies: The GitOps Bridge pattern

With the growing need for scalable and resilient multi-cluster environments, the combination of infrastructure as code (IaC) and Argo CD using Amazon EKS Blueprints brings automation and declarative management to the forefront of the deployment process. This session showcases practical examples, hands-on interaction with the technology stack, and insights into best practices. Learn how to bridge the gap between IaC and applications, streamline deployments, and harness the full power of cloud-native solutions in multi-cluster Kubernetes environments. Join us to embrace the future of GitOps and discover how to modernize your development and operations in the AWS Cloud.

Kubernetes threat detection in multiple layers

This session features a live demo with hands-on lab materials, where you learn how to detect, alert, and investigate potential threats and suspicious security events in Amazon EKS. See how you can do this using Amazon GuardDutyā€™s different layers of protections, including malware protection, Amazon EKS Audit Log Monitoring, and Amazon EKS Runtime Monitoring.

Chaos Engineering with AWS Fault Injection Simulator and CNCF projects

Build fault-tolerant apps with chaos engineering Join this session to see a live demo using a chaos engineering version of your favorite arcade game and see how fault-tolerant, well-architected applications can make the most of Spot Instances. In this session, see how Karpenter implements the best practices to make the most of instance diversification leading to cost-optimized clusters. Hear how you can use AWS Fault Injection Simulator and AWS Resilience Hub to test the resiliency of your applications and cluster setup.

Ask an Expert

Our "Ask the AWS Expert" demo station will be a dedicated space for attendees to engage directly with AWS experts. This station will provide a unique opportunity for walk-up visitors to get their questions answered, engage in in-depth discussions, and utilize whiteboarding to dive deeper into their specific challenges and inquiries.