Get the latest on AWS AI Chips from re:Invent 2024
Key announcements and discover how industry leaders, like Apple and Anthropic, are revolutionizing AI with AWS Trainium and Inferentia
Authored by Armin Agha-Ebrahim, GTM Specialist, AWS AI Chips
This year at re:Invent, we announced the general availability of Amazon EC2 Trn2 instances and first of its kind preview of the Trn2 UltraServers, powered by AWS Trianium2! We heard from industry leaders like Anthropic, Apple, Qualcomm, Adobe, poolside, Databricks, Amazon Rufus, IBM, Ricoh, and many others how they are using Trainium and Inferentia to deliver better generative AI solutions to their customers, while lowering costs. We also unveiled Project Rainier with Anthropic, delivering 5x exaflops of compute power over their current training cluster. Amazon Bedrock introduced Latency Optimized Instances, powered by Trn2 instances, to offer customers the highest performance for LLM inference. And, we pre-announced AWS Trainium3!
For the machine learning and AI crowd, we offered many sessions about our purpose-built ML accelerators: AWS Inferentia and AWS Trainium. These sessions dive deep on to how Trainium2 enables the highest inference and training performance on AWS, hands on labs optimizing models for the highest performance while lowering costs, and how to create sustainable solutions to accelerate deep learning applications in the cloud.
Missed a session or two during the bustling week in Las Vegas? No problem! We’ve got you covered. Here’s a summary of all the keynote highlights and breakout session recordings for AWS Trainium and Inferentia. Take your time to explore them all in one place. Enjoy!
Keynote Highlights
Monday Night Live with Peter DeSantis
Peter did a technical deep dive on Trainium2, Trn2 Instances and the all new Trn2 UltraServers. He shared some of the technical innovations at the chip, instance and UltraServer levels that enables Trainium2 to exceed the growing challenges of AI developers and data scientists pushing the boundaries of building and serving generative AI models at scale. He was joined on stage by Tom Brown, co-founder and Chief Compute Officer at Anthropic, to talk about how they are leveraging Trainium2 to deliver 60% faster model serving for their latest generation of Haiku 3.5 models and announce Project Rainier, a new compute cluster leveraging hundreds of thousands Trainium2 chips, delivering 5x more exaflops as compared to their current training cluster.
AWS re:Invent 2024 CEO Keynote with Matt Garman
Matt announced the general availability of Trn2 instances and launched Trainium2 UltraServer (preview), with 64 Trainum2 chips all interconnected by high speed low latency NeuronLink. Matt shared how Trainium2 delivers up to 30% to 40% better price performance than current GPU powered instances and how the Trainium2 UltraServer delivers an impressive 83 PFLOPS of dense FP8 Compute and up to 332 PFLOPS of sparse compute, and 6TB of HBM capacity. Apple’s Senior director of machine learning and AI Benoit joined Matt to talk about some of the use cases Apple is utilizing Graviton, Inferentia today and the new Trainium2 chips. With Graviton, Benoit shared they have realized over 40% efficiency gains by migrating to Graviton from x86 instances. With Inferentia he shared how they were “able to execute some of their search text features twice as efficiency after moving from G4 instances to Infernetia2.” And they are already evaluating Trainium2, as Benoit shared, “they are in the early stages of evaluating Trainium2 and expect to gain up to 50% improvement in efficiency in pre-training with the help of Trainium2.”
Breakout Highlights
AWS Trainium2 for breakthrough AI training and inference performance
Introducing the new Amazon EC2 Trn2 instances, the second-generation Trainium chip. The highest performing EC2 instance for deep learning and generative AI. Also introducing Trainium2 UltraServer the highest performance ML server in EC2 with 83.2 PFLOPS of dense compute and 185 TB/s of HBM bandwidth. Anthropic selected Trainium for several reasons: the incredible price-performance, the flexible and programmable chip architecture, the Trn2 UltraServers for scale-out training/inference of large models, and the low-level access provided by features like the Neuron Kernel Interface.
Conquer AI Performance, cost & scale with AWS AI Chips
Trainium2 is tackling the industry's challenges of performance, cost & scale. One customer utilizing the benefits of AWS's purpose-built AI chips is Amazon's Rufus on Prime Day; they deployed up to 80k+ Trainium and Inferentia AI chips. Another customer taking advantage of AWS's AI chips is poolside, they shared a few takeaways that resulted in Trainium2 achieving 72% performance increase compared to Trn1. Lastly, Google DeepMind Matt Johnson demonstrated batching and implement model parallelism during a live JAX Trainium demo. Watch this breakout to see the different ways Trainium2 is empowering customers like Amazon Rufus, poolside, and JAX.
Customer stories: Optimizing AI performance and cost with AWS AI chips
This breakout showcases how industry leaders Ricoh, Arcee.ai, IBM, and ByteDance are leveraging AWS Trainium to drive innovation and advance their AI capabilities. Ricoh is deployed the first large-scale cluster for Japanese language LLM training with 256 Trn1 nodes on AWS and lowered training costs by 50% and reduced training time by 25% compared to GPUs. Arcee.ai is harnessing Trainium to improve cost-performance of their SuperNova (70B) models by 300% compared to high-end GPU instances. IBM shares how their partnership with AWS "is one of their most exciting partnerships," as IBM's Granite models are deployed with Amazon's AI chips, offering "very good latency and high performance." ByteDance is utilized AWS Inferentia to deploy their multi-modal models globally, achieving 20% higher throughput and 13% lower costs than with GPUs. Watch this breakout to explore how AWS's AI chips are empowering customers like Ricoh, Arcee.ai, IBM, and ByteDance in diverse and innovative ways.
Workshops, Builders Sessions, and Hands-On Examples
At re:Invent this year, we had many hands on session to enable developers and builders time to start their journey on AWS AI Chips, Trainium and Inferentia, or learn some new skills. These sessions are available as self paced workshops online for you as well in the links below.
Fine-tune Hugging Face LLMs using Amazon SageMaker and AWS Trainium This workshop focuses on fine-tuning and deploying large language models (LLMs) with AWS SageMaker, Trainium, and Inferentia. You'll explore efficient fine-tuning via LoRA, leverage AWS's ML accelerators for training and inference, and use SageMaker for seamless orchestration. Learn to enhance model performance cost-effectively and at scale.
Adapting LLMs for domain-aware applications with AWS Trainium post-training This workshop teaches LLM domain adaptation using AWS Trainium, Inferentia, SageMaker, and Hugging Face Optimum Neuron. You'll fine-tune pre-trained LLMs for specific domains with Trainium for cost-effective, scalable training and Inferentia for low-latency, high-throughput inference. Both accelerators integrate seamlessly with AWS services for efficient, secure workflows.
Fine-tune Hugging Face LLMs using Amazon SageMaker and AWS Trainium This workshop focuses on fine-tuning and deploying LLMs using AWS SageMaker, Trainium, and Inferentia. You'll explore efficient techniques like LoRA for domain-specific tuning, leveraging SageMaker for orchestration. Targeted at ML practitioners, this session offers hands-on experience with cost-effective, scalable model training and inference in supported AWS regions.
Demystifying LLM deployment and optimization with AWS Inferentia event In this workshop, you'll learn LLM inference optimization on AWS AI chips, deploy Mistral using vLLM on Amazon EKS with AWS Inferentia, and use NeuronX Hugging Face Text Generation Inference to deploy Llama 3.2 1B on SageMaker with AWS Inferentia.
Keeping it Small: Agentic Workflows with Small Language Models (SLMs) on AWS Inferentia In this workshop, you will create a personalized solution that balances the innovative capabilities of LLMs with adherence to human directives and human-curated assets for a consistent and responsible personalization experience for the customers of a fictional company named OneCompany Consulting.
Harness FMs with AWS purpose-built accelerators for industry apps Harness Feature Management with AWS purpose-built accelerators for industry applications enables organizations to streamline the development, deployment, and management of applications tailored for specific industries. By combining Harness's robust feature management platform with AWS’s industry-specific accelerators, this collaboration empowers businesses to deliver innovations faster while addressing unique industry challenges.
What’s next?
Now you're up to speed on the exciting re:Invent 2024 announcements for Trainium and Inferentia. You've seen how AI leaders are harnessing the power of these innovative technologies to accelerate model building and deployment while reducing costs. It's time to take the next step. Dive into one of our self-paced workshops above, carefully designed to help you get started with Trainium or Inferentia. For a closer look at AWS Trainium2, visit our Trainium page. And, be sure to explore the Neuron Docs, your go-to resource for tutorials, examples, and expert guidance to fuel your AI journey.
Relevant content
- AWS OFFICIALUpdated 14 days ago
- AWS OFFICIALUpdated 14 days ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 9 months ago