re:Invent 2025 - Building the future with AWS Serverless
AWS Lambda turns 10, and the biggest objections to running production workloads on Serverless have never had better answers. This session from re:Invent 2025 covers three major launches: the Serverless MCP server, Lambda Managed Instances, and Lambda Durable Functions, along with a live-built application showing each capability in action.
Serverless compute promises to remove infrastructure concerns so developers can focus on business logic, but specific workload types have consistently sent engineering teams elsewhere. Cost at scale, CPU-intensive processing, and long-running workflows requiring reliability guarantees were the most cited objections. Usman Khalid, Head of Serverless Compute at AWS, and Janak Agarwal, Head of Product for AWS Lambda, addressed each of these directly through a live demonstration. They built a note-taking application from scratch, scaled it to absorb sudden traffic bursts, added encryption and sentiment analysis, and finished by building a workflow that calls an LLM to summarize notes. In this post, we'll walk through the three capabilities they introduced and what each one enables for your workloads.
AI-assisted development with the Serverless MCP Server
One common frustration with AI-assisted code generation is that it speeds up writing code without speeding up shipping it. Generated output often lacks IaC (Infrastructure as Code), has inconsistent error handling, and needs significant cleanup before it meets production quality. The Serverless Model Context Protocol (MCP) server addresses this by giving AI coding assistants a set of tools grounded in Serverless best practices.
When Janak prompted his AI coding assistant to build a CRUD (Create, Read, Update, Delete) API with TypeScript, Amazon API Gateway, Lambda functions, Amazon DynamoDB for storage, and structured logging with Amazon CloudWatch, the generated output included input validation, three-retry logic for DynamoDB writes, consistent HTTP status codes, a global error handler, and a complete AWS SAM (Serverless Application Model) template, without any additional prompting. The MCP server guides the AI toward this output by providing tools that describe well-architected patterns for Lambda: which workloads are appropriate, how event-driven architectures should be structured, how to configure integrations, and how to produce deployable IaC alongside the application code.
Deployment uses SAM, which builds the TypeScript, surfaces issues like missing API authentication before deployment completes, generates a CloudFormation change set showing exactly what will be created, and deploys the full stack. The entire process, from a natural language prompt to five deployed Lambda functions with a working API, took under 10 minutes in the demo. The practical impact is that IaC, which has historically been the slowest part of the process to produce correctly, is now generated at the same time and to the same quality level as the business logic itself.
Steady-state and CPU-intensive workloads with Lambda Managed Instances
Lambda scales quickly. In the session's load test, traffic increased 700 to 800 times over 30 seconds with zero errors and zero throttles. This makes Lambda well suited to spiky traffic patterns, such as a flash sale or a viral event, where you need to absorb a sudden surge without pre-provisioning capacity.
But as applications mature, the workload profile often shifts. Encryption and decryption processing, sentiment analysis, or model inference are CPU-intensive. Traffic becomes more predictable and steady rather than spiky. At that point, teams have typically re-architected away from Lambda to access larger compute instances and to apply savings plans that reduce cost for sustained usage. Lambda Managed Instances (LMI) removes that tradeoff.
With LMI, you create a capacity provider that specifies EC2 instance types, VPC subnet and security group configuration, and optional scaling bounds. Lambda then provisions, scales, patches, and routes traffic to those instances automatically. You can select instance families like Graviton4 for cost and performance, choose memory-to-CPU ratios that match compute-intensive or memory-intensive workloads, and enable multi-concurrency so a single execution environment handles multiple simultaneous requests. Multi-concurrency has been a long-standing customer request, and it directly reduces cost because fewer execution environments are needed at any given concurrency level.
In a 90-minute load test with steady-state traffic, LMI's built-in autoscaling maintained around 25% instance utilization, compared to the roughly 6% utilization that is common when teams provision for peak and never revisit scaling policies. AWS charges a 15% management fee on top of EC2 instance costs, covering automatic scaling, patching, and continuous optimization. In the demo, the total EC2 cost for the load test came to approximately $2.05 with LMI, versus $8.50 for the same traffic profile on a fixed provisioned fleet. Critically, LMI does not change how you write, deploy, or monitor Lambda functions. The same CloudWatch metrics, CI/CD pipelines, and event source integrations continue to work unchanged.
Long-running and reliable workflows with Lambda Durable Functions
Lambda functions have a 15-minute execution timeout. For most event-driven workloads, this is sufficient. But workflows that call LLMs and wait for a response, workflows that pause for human review before continuing, and multi-step transactions that need retry guarantees and exactly-once execution semantics have required either AWS Step Functions or custom orchestration built on queues. Lambda Durable Functions brings workflow reliability directly into the Lambda function itself.
Durable Functions adds a durable context object to the Lambda runtime. You write sequential code, using the context object to define steps and checkpoints. At any point, you can call context.wait() with a condition, which shuts down the execution environment until the condition resolves. During the wait, you pay nothing for compute. When execution resumes, it picks up from the step after the checkpoint. A 30-step workflow that has completed 25 steps and then fails does not replay the first 25 steps on retry; it retrieves the checkpointed results and continues from where it stopped.
The underlying infrastructure is shared with Step Functions, which provides the durability guarantees. Idempotency is handled by assigning a unique name to each execution, preventing duplicate workflow instances from starting for the same operation. Workflows can run for up to one year, making it practical to build human-in-the-loop systems where a workflow might pause for hours or days waiting on a reviewer. The Lambda console surfaces a Durable Executions tab that shows each step's progress and status in real time. Current language support covers Node.js and Python, with additional runtimes planned.
What this means for your architecture
The session's broader point is that the main objections to Serverless at production scale now have direct answers. The MCP server removes the IaC complexity that slowed AI-assisted development. LMI removes the cost and compute flexibility objections for steady-state workloads. Durable Functions removes the timeout constraint for long-running and human-in-the-loop workflows.
The session also briefly introduced Lambda Tenant Isolation, which lets you pass a tenant ID at invocation time so Lambda creates fully isolated execution environments per tenant, without requiring a separate function per customer. This is relevant for SaaS applications where customer data separation is a hard requirement.
On the roadmap: native OpenTelemetry (OTel) support for observability, Rust runtime support (which launched shortly before re:Invent), and continued investment in developer tooling including remote breakpoint debugging for Lambda functions through the AWS Toolkit for VS Code. A public Lambda roadmap is now available for feedback. If any of these capabilities address problems you have been working around, the session recording covers the full live demos in detail.
- Language
- English
Relevant content
- Accepted Answer
