Skip to content

AWS re:Invent 2024 - High-performance generative AI on Amazon EKS

8 minute read
Content level: Advanced
0

This blog post summarizes key highlights from the AWS re:Invent 2024 session "High-Performance Generative AI on Amazon EKS" presented by Mike Stefaniak (Sr. Manager Product Management, AWS), Cas Starsiak (AVP Enterprise Research and AI, Software Product Management, Lilly), and Rama Ponnuswami (Principal Containers Specialist, AWS).

Imagine training an AI model that costs thousands of dollars per hour to run. Now imagine deploying that model to serve thousands of customers simultaneously while maintaining reasonable costs. This is the challenge many companies face as they move beyond AI experiments to production systems.

Companies are now putting generative AI to work in real business applications, not just testing it in labs. At re:Invent 2024, Mike Stefaniak, Rama Ponnuswami, and Cas Starsiak explained this shift from experiments to actual products that serve customers. These AI tools are helping in many ways - making customer service better, saving employees time, creating content, and improving business operations. But running them in production brings new challenges around cost, speed, and security. For teams that need complete control over how their AI systems run, Amazon Elastic Kubernetes Service (Amazon EKS) offers a flexible foundation.

Common Challenges When Scaling Generative AI

Organizations face three primary challenges when scaling generative AI workloads:

  1. Multiple Model Management: Different teams within an organization often require different models customized for specific use cases. This creates complexity in versioning, upgrades, and access control.
  2. Data Integration: Customizing models requires domain-specific data from multiple sources while maintaining security requirements and access controls.
  3. Infrastructure Scale: As deployments grow, managing infrastructure becomes more complex. Organizations need greater control over their environment to optimize costs and performance.

For data scientists and ML engineers, the key challenge is accessing reliable infrastructure without managing it directly. They need a platform that handles infrastructure complexities while allowing them to focus on model development and deployment.

Why Amazon EKS Makes Sense for Generative AI

Amazon EKS addresses these challenges by providing a flexible, customizable platform that gives organizations the control they need while reducing operational complexity. Ponnuswami outlined three key reasons why organizations choose Amazon EKS for generative AI workloads.

First, Amazon EKS helps teams move faster. For organizations that have already standardized on Kubernetes for application development, Amazon EKS allows them to extend their existing platform rather than build something new. Additionally, the open-source ML ecosystem typically provides out-of-the-box Kubernetes integration. "ML space in general is fast moving, and a lot of those innovations are happening in the open source arena," Ponnuswami noted. "These open source tools usually come with out-of-the-box Kubernetes integration."

Second, Amazon EKS provides unparalleled customization capabilities. Organizations gain control of their infrastructure down to the instance level, allowing them to configure environments that meet their specific requirements. This flexibility extends to instance selection, with Amazon EKS supporting all Amazon EC2 instance types, including specialized GPU instances.

Third, Amazon EKS provides seamless scalability and continuous cost optimization. Using tools like Karpenter, organizations can automatically select the right instances for each workload and scale efficiently. GPU sharing mechanisms and multi-tenancy features help increase resource utilization, further optimizing costs as deployments grow.

Several customer success stories illustrate these benefits. Vannevar Labs achieved a 45% reduction in inference costs by efficiently utilizing mixed CPU and GPU instance types. Informatica built an LLMOps platform on Amazon EKS, achieving 30% cost savings compared to managed services while gaining enhanced configurability. Perhaps most impressively, Hugging Face runs the free tier of their hub on Amazon EKS with over 2,000 nodes, using sophisticated packing strategies and time-sharing for GPU resources.

Recent Amazon EKS Features for Generative AI

AWS has launched numerous features to make Amazon EKS an ideal platform for generative AI workloads. The Amazon EKS control plane has been enhanced to support larger clusters, with work to refactor the etcd management system to handle clusters with tens of thousands of nodes. For high-performance multi-node communication, AWS has enhanced the Elastic Fabric Adapter (EFA) integration with Amazon EKS. Recent improvements include support for cross-subnet communication within the same Availability Zone and EFA-only ENI support, which helps address IP address exhaustion in large clusters.

To support the massive datasets required for AI training, Amazon EKS integrates with Amazon S3 through the Mountpoint CSI driver. This allows teams to mount S3 buckets as file systems, with recent enhancements including more fine-grained access controls. Amazon Linux 2023 and Bottlerocket now include accelerated AMIs optimized for generative AI workloads. These AMIs include pre-configured drivers, frameworks, libraries, and CUDA toolkits necessary for AI workloads. "We have a sophisticated testing framework where we're running training and inferencing jobs before we release these AMIs," Stefaniak explained. "You can just take them, run them, and be confident that they're going to work."

For training workloads, Amazon EKS Managed Node Groups now integrate with Amazon EC2 Capacity Block Reservations, allowing organizations to book GPU capacity up to eight weeks in advance. Node health monitoring and auto-repair capabilities now help identify and resolve issues with GPU instances before they cause workload disruptions.

Optimizing Inference Workloads on Amazon EKS

As organizations move to production, inference becomes the primary focus. Running inference at scale introduces specific challenges related to balancing throughput, latency, and cost. "These are the three measurements that we really care about", Stefaniak noted. "The throughput, the latency, and cost, which is really the GPU compute utilization."

Amazon EKS provides several capabilities to help optimize this balance. Karpenter enables fast scaling based on workload requirements, automatically selecting the right instance types. The newly announced Amazon EKS Auto Mode further simplifies this process by integrating Karpenter directly into Amazon EKS and automating many complex setup tasks, such as configuring RAID 0 volumes and installing the appropriate device plugins for specialized hardware.

A common pattern for inference workloads combines Ray and VLLM with Karpenter. This architecture uses standard CPU instances for control components while dynamically spinning up GPU instances for actual inference processing, providing an optimal balance of performance and cost efficiency.

Real-World Implementation: Eli Lilly's Generative AI Platform

Cas Starsiak shared Eli Lilly's journey building a generative AI platform on Amazon EKS. Despite being a 150-year-old pharmaceutical company, Lilly has embraced digital transformation with the recognition that "we really need to become a tech company to continue succeeding as a medicines company."

Lilly created a platform called CATS (Cloud Applications and Technology as a Service) built on AWS. CATS provides a comprehensive cloud application development and hosting solution that uses Amazon EKS for Kubernetes management. The platform implements GitOps and CI/CD automation, allowing teams to go "seamlessly from commit to production."

Building on CATS, Lilly created a generative AI platform that includes a model library with access to various LLMs, orchestration tools using LangChain, operations and scaling capabilities, evaluation tools, data integrations, information retrieval features, and comprehensive security controls.

Launched in December 2023, the platform has already made a significant impact across Lilly's business with use cases including an agentic discovery assistant for drug discovery, tools for regulatory inquiries, manufacturing assistants, claims drafting tools, and patient-facing Q&A applications.

The platform has seen rapid adoption, with over 500 developers using it, thousands of end users across more than 30 countries, and billions of tokens processed monthly. "Having this platform approach, using a scalable platform like Amazon EKS—it's worked," Starsiak said. "We have developers using it, it's accelerating them, we're seeing things scale, we're seeing things move into production."

Starsiak shared several valuable lessons from their experience. While early adopters naturally gravitate to new platforms, bringing in mainstream users requires investment in evangelism, documentation, and support. Expectations also increase rapidly—initial skepticism quickly turns into demands for formal SLAs and enterprise-grade reliability. Finally, early investment in cybersecurity partnerships proved essential to gaining organizational trust.

Getting Started with Generative AI on Amazon EKS

To help organizations implement generative AI on Amazon EKS, AWS has created the Data on EKS project. This open-source project provides blueprints, patterns, and Terraform templates for common generative AI use cases, helping teams integrate popular open-source solutions with best practices.

The project includes specific patterns for inference workloads, such as NVIDIA with VLLM, Ray with VLLM, and NVIDIA NIM on Amazon EKS. By using these templates, organizations can quickly set up sophisticated ML environments without reinventing fundamental infrastructure components.

Conclusion

As generative AI moves from experimentation to production, organizations need platforms that provide the right balance of flexibility, control, and ease of use. Amazon EKS offers this balance, enabling teams to customize their environments while benefiting from continued AWS innovation in areas like scalability, performance, and cost optimization.

The success of Eli Lilly's generative AI platform demonstrates how Amazon EKS can serve as a foundation for enterprise-wide AI adoption. By building on existing Kubernetes expertise and using the rich ecosystem of open-source ML tools, organizations can accelerate their AI initiatives while maintaining control over their infrastructure and costs. Whether you're just starting your generative AI journey or looking to scale existing workloads, Amazon EKS provides the capabilities needed to run high-performance generative AI in production. Through continued innovation and community engagement, AWS is committed to making Amazon EKS the best platform for organizations looking to harness the power of generative AI.

For those interested in learning more, you can explore the Data on EKS project, check out the EKS Best Practices Guide, or watch the full session recording on the AWS YouTube channel.