re:Invent 2025 - Build, deploy, and operate agentic architectures on AWS Serverless
Building agent applications in production requires more than picking an LLM. You need to think through how agents gather context, how they communicate with tools and other agents, and where they run at scale. This session from re:Invent 2025 covers the design principles, open standards, and practical Serverless deployment patterns that make agentic architectures production-ready.
Most teams building LLM-based applications start with the same question: how does the model get access to the data it needs to reason well? Heeki Park, Principal Solutions Architect at AWS covering generative AI, and Dhiraj Mahapatro, Principal Specialist Solutions Architect at AWS, grounded this question in a concrete scenario throughout the session: a customer is charged twice for an order, receives only one of two items, and needs the issue resolved quickly. Resolving this request requires pulling data from at least three systems, orders, payments, and inventory, and generating an actionable recommendation under time pressure. In this post, we'll walk through how they evolved from basic LLM context retrieval to multi-agent coordination, which open standards make that coordination manageable, and how to deploy these patterns using Serverless on AWS.
From context retrieval to agentic loops
Before an agent can reason effectively, it needs context. The simplest approach is retrieval-augmented generation (RAG): your application constructs a prompt, fetches relevant data from a knowledge source, and submits an enhanced prompt to the model. The application does the orchestration here. It decides what data to fetch and when.
Tool use takes this further. Instead of hardcoding which data to retrieve, you provide the model with a list of available tools and let it decide which ones it needs. The model responds with a tool selection, your application executes the call, returns the result, and the model continues reasoning. This shifts orchestration logic to the model and reduces the amount of imperative code you write.
An agentic loop extends this into a repeating cycle. The model requests tools, receives results, and continues reasoning until it reaches a completion state. In the session's refund scenario, an order agent calls a get-order-details tool, then a get-invoice tool, then a start-refund-process tool, feeding each result back to the model before the next step. The key difference from implementing this in AWS Step Functions is that it requires a choice state that explicitly branches to each possible tool, meaning a code change, build, and deployment every time you add a new tool. With an agentic SDK, the framework routes tool calls dynamically.
Dhiraj demonstrated this using Strands Agents, an open-source SDK released by AWS earlier in 2025. The order agent, with three tools and a full agentic loop, fit in a single Python class where each tool was a decorated method. No routing logic was needed. The model, running on Amazon Bedrock, determined which tool to call and in what order based on the current context. Strands supports non-Bedrock model providers and runs on Lambda, Fargate, EKS, or EC2.
Open standards for multi-agent coordination: MCP and A2A
As your agent system grows beyond a single domain, coordination becomes the central challenge. An orders agent needs data from a payments agent. Without a shared protocol, the orders agent would need to understand the payments domain's internal tools, data contracts, and authentication requirements. This is the same coupling problem that domain-driven design and microservices were designed to solve, applied to agents.
The Model Context Protocol (MCP), developed by Anthropic, addresses the agent-to-tool communication problem. MCP defines a client-server protocol where agents act as MCP clients and tool providers expose MCP servers. Rather than supporting multiple protocols for SQL, GraphQL, or REST endpoints, both sides converge on MCP. Tool providers build one interface; agent builders support one standard. Dhiraj showed an MCP server built with FastMCP in a few lines of Python, and a Strands agent consuming it as an MCP client with a single import. Strands includes built-in MCP client support.
Agent-to-Agent (A2A), a protocol from Google, addresses agent-to-agent communication. With A2A, the payments agent publishes an agent card: a structured document describing its name, version, skills, supported interaction modes, and authentication requirements. The orders agent fetches that card and creates a task for the payments agent rather than calling its internal tools directly. Domain knowledge stays encapsulated. The recommended pattern is MCP within a domain for agent-to-tool calls, and A2A across domains for agent-to-agent delegation.
Adding event-driven architecture on top of this further reduces coupling. In the session's pattern, an incoming API call generates an event on Amazon EventBridge. That event triggers the orders agent. The orders agent coordinates with the payments agent via A2A. A completion event triggers a WebSocket notification to the end user. If the payments agent fails, the event is retained and can be replayed. New agents can subscribe to existing events without modifying the running system.
Serverless deployment options for agents and tools
For teams deploying agents in production, there are three primary compute options depending on your existing CI/CD practices.
The first is AWS Lambda with Amazon API Gateway. Your Strands agent or MCP server runs inside a Lambda handler. API Gateway provides authentication, Web Application Firewall (WAF) integration, and CloudFront compatibility. Across domains, the orders Lambda makes A2A calls to the API Gateway endpoint of the payments agent. Within each domain, MCP tool calls happen inside the Lambda execution environment. This is the lowest barrier to entry for teams already using Lambda.
The second is Amazon ECS Fargate with an Application Load Balancer (ALB). Teams whose CI pipelines produce container artifacts can package a FastAPI and Strands application, or an MCP server, as a container image, push it to an Amazon Elastic Container Registry (ECR) repository, and deploy it to Fargate. The deployment model fits existing container workflows.
The third option is Amazon Bedrock AgentCore, which reached general availability a couple of months before the session. AgentCore Runtime is Serverless compute built for agent workloads. It accepts both zip and container artifacts, so teams can adopt it without changing their existing deployment pipelines. AgentCore Gateway acts as a front end for tools, analogous to API Gateway for APIs. You can point it at an existing Lambda function, add a tool description, and expose that function as an agent tool without refactoring the function. OpenAPI schemas can also be imported and exposed as tools directly.
AgentCore includes two additional capabilities worth noting. AgentCore Identity propagates the original user's OAuth2 scopes down through MCP tool calls so agents cannot escalate privileges beyond what the originating user was authorized to access. There is also a managed code interpreter tool that runs untrusted, LLM-generated code in an isolated environment with no access to your production data.
Choosing a multi-agent pattern
The session covered three multi-agent patterns. A serialized pipeline suits linear workflows like content generation but a failure at one stage blocks the rest. The orchestrator pattern uses a top-level agent to decompose a request and delegate to domain agents. This is the pattern Heeki and Dhiraj demonstrated throughout the session: an orchestrator handles the support request by calling the orders, payments, and inventory agents as needed. It improves context window efficiency and allows each domain agent to scale independently, but adds network overhead between agents.
A swarm architecture sends multiple agents in parallel to gather information from different sources and synthesizes the results afterward. This suits exploratory research tasks where the path to a result is not known in advance, but is harder to reason about and debug.
The practical starting point from the session: begin with a single monolithic agent. It is simpler to develop, simpler to debug, and produces clearer reasoning traces. As domain boundaries become clear and the use case grows, decompose into domain agents and apply MCP and A2A for coordination. Lambda, Fargate, and AgentCore Runtime are each available as compute targets regardless of which pattern you choose.
- Language
- English
Relevant content
- asked a year ago
