Ongoing service disruptions
For the most recent update on ongoing service disruptions affecting the AWS Middle East (UAE) Region (ME-CENTRAL-1), refer to the AWS Health Dashboard. For information on AWS Service migration, see How do I migrate my services to another region?
re:Invent 2025 - Streamline Amazon EKS operations with Agentic AI
When a Kubernetes incident fires, the investigation typically involves a sequence of manual steps: running kubectl commands to collect context, scanning logs, searching Slack history for similar past incidents, and applying a fix. Every one of those handoffs is time the issue remains unresolved. Session CNS421 shows a live-coded multi-agent system that handles the full loop from alert to remediation, built with an open source agent SDK, a hosted MCP server, and a vector knowledge store.
When a Kubernetes pod fails at 2 AM, the typical response involves waking up the right engineer, who then manually collects cluster state, searches for prior solutions, and applies a fix. In session CNS421 at AWS re:Invent 2025, Sai Vennam, Principal Solutions Architect at AWS, and Lucas Soriano Alves Duarte, Principal Solutions Architect at AWS, live-coded a Kubernetes troubleshooting agent they called the Kube Agent: a multi-agent system built with the Strands open source SDK that receives alerts from Slack, uses the hosted Amazon EKS MCP server for live cluster access, classifies message intent using Amazon Nova Micro, and stores and retrieves institutional knowledge using S3 Vectors in Amazon S3. In this post, we'll walk through the architecture they built and the design decisions behind each layer.
Why retrieval augmented generation alone isn't enough
The prior year's approach for large language model (LLM)-assisted Kubernetes troubleshooting relied on Retrieval Augmented Generation (RAG): chunking cluster logs into a vector database, converting them to embeddings, and injecting relevant context into the LLM prompt when an engineer asked a question. This works reasonably well for static documentation and runbooks, but it has two limitations that matter for real incident response.
The first is staleness. Logs and cluster state were being chunked and stored on a schedule, sometimes every 30 to 60 minutes. If a pod was just deployed and started failing, that data might not yet be in the vector database when an engineer needed it. The second is manual effort. Engineers would run kubectl commands themselves, copy the output, and paste it into the LLM. The model was providing guidance, but the diagnostic loop still ran through a person.
What changed is the availability of MCP (Model Context Protocol) and the Strands open source SDK for building agents. MCP gives agents a standardized way to call external tools and retrieve live data. Rather than building integrations for each combination of LLM and tool, MCP acts as a single integration layer that agents can discover at runtime. Strands, built by AWS and contributed to open source, provides the Python SDK to put this together with minimal boilerplate, including built-in support for multi-agent orchestration, before-invocation hooks, and MCP client connectivity.
Three layers that make the Kube Agent work
The session built up the agent in three distinct steps, each adding a meaningful capability.
Live cluster access through the hosted EKS MCP server. The initial agent had two manually coded tools: describe_pod and get_pods. When asked what namespaces exist in the cluster, it could only infer namespaces indirectly from pod listings and returned an incomplete answer. After enabling the hosted EKS MCP server, the agent had access to the full range of tools the MCP server exposes without additional code. For the same namespace question, the agent called list_k8s_resources automatically, a tool that existed only in the MCP server's own documentation, and returned the correct answer.
The hosted EKS MCP server removes the requirement to run a local MCP server process. It runs in the cloud, uses AWS Identity and Access Management (IAM) for authorization, and connects to the cluster through the standard EKS API. The Strands MCP client connects to it with a few lines of configuration, and the agent discovers available tools at runtime. Before this capability, teams had to run a local proxy, manage permissions separately, and keep the MCP server process running alongside the agent.
Smart message routing with Amazon Nova Micro. Because the Slack bot listens to every message in a channel, routing each one to the full Claude-powered troubleshooting agent would be slow and expensive. The session's first pass at filtering used a static keyword list, which both missed valid troubleshooting questions that didn't use the pre-defined keywords and incorrectly triggered on unrelated messages that happened to contain them.
The replacement uses Amazon Nova Micro via Amazon Bedrock to classify each incoming message before it reaches the specialist agent. The classification prompt asks a single yes-or-no question: is this message related to Kubernetes, system troubleshooting, technical issues, or a request for help? The response is capped at 10 output tokens, since "yes" and "no" require fewer than 10 tokens each. Nova Micro responds fast enough that the added latency is negligible for a Slack bot, and the cost per classification is a fraction of what it would cost to invoke the full agent. Strands provides a Before Invocation hook that intercepts each message before it reaches the LLM, which is exactly where the classification step runs. If the answer is no, the agent exits without incurring the cost of a full inference call.
Tribal knowledge with S3 Vectors and a memory micro-agent. The third layer addresses a pattern common to every engineering organization: someone knows the solution to a class of problem, shares it in Slack, and months later no one can find it. The session framed this as "tribal knowledge" and built a dedicated memory agent to capture and retrieve it automatically.
The memory agent runs as a separate service with its own endpoint, using Strands' agent-to-agent (a2a) communication protocol. It has two tools: one to embed and store a solution in S3 Vectors using Amazon Titan embeddings, and one to embed an incoming query and retrieve the closest matching solutions by cosine similarity. S3 Vectors, announced in July 2025, is a purpose-built vector store in Amazon S3. Both storage and retrieval use the same Amazon Titan model for embeddings, since retrieval requires searching the same embedding space used during storage.
The Orchestrator agent's system prompt instructs it to check memory before doing live troubleshooting. If a relevant solution is found, it uses it directly. If not, it falls back to the Specialist agent with MCP access to investigate the live cluster, then stores the solution it finds for future use. In the demo, a DevOps engineer posted a Slack tip about which container image to use for the node exporter. The agent stored it without any explicit instruction. Later, when a monitoring pod failed because it was referencing a nonexistent image tag, the agent retrieved that stored recommendation as part of its investigation and resolved the incident.
Multi-model and multi-agent design
The session used four distinct models: Claude 3 Sonnet for the troubleshooting specialist, Amazon Nova Micro for message classification, Amazon Titan for embeddings, and a second Sonnet variant for the memory agent. The selection follows the same logic as service decomposition: each model is chosen for what it does best at the required speed and cost point. Nova Micro handles binary classification at minimal latency and cost. Titan is purpose-built for generating vector embeddings. Sonnet carries the multi-step reasoning load for live cluster diagnostics.
The agents follow the same decomposition principle. The Orchestrator, Specialist, and memory agent are separate deployable units, each with a single clear responsibility. The Orchestrator discovers what other agents are available at runtime and routes work to them. The Specialist has MCP access for live Kubernetes diagnostics. The memory agent owns the read and write path to the knowledge store. Each can scale independently in a Kubernetes deployment, and the memory agent can be reused by entirely different troubleshooting bots for other domains. As the session noted, this mirrors the same reasoning that drove the shift from monolithic applications to microservices: a focused component with a clear boundary is easier to operate, scale, and extend than a single component that does everything.
The Strands SDK provides the connective tissue: the Before Invocation hook for classification, the agent-as-tools pattern for agent-to-agent communication, and the MCP client for tool discovery at runtime. The full sample code and Helm charts for deploying the system to a Kubernetes cluster are available in the AWS containers sessions GitHub repository.
Key takeaways
Building a genuinely useful Kubernetes troubleshooting agent requires more than a chat interface in front of an LLM. The combination of live data through the hosted EKS MCP server, cost-efficient routing through Nova Micro classification, and persistent institutional knowledge through S3 Vectors and a dedicated memory micro-agent produces a system that can detect an alert, reason about live cluster state, recall prior solutions, and resolve incidents for known failure modes without human involvement.
The architectural pattern demonstrated in the session, multiple specialized agents each with a focused responsibility, makes the system maintainable and extensible. Adding a new capability means adding a new agent with a specific system prompt, not rebuilding a single agent that handles everything. The same pattern applies to selecting models: choose the right one for each job rather than routing everything through the most capable model available.
Watch the full session recording: AWS re:Invent 2025 - Streamline Amazon EKS operations with Agentic AI (CNS421)
- Language
- English
Relevant content
- Accepted Answerasked 3 years ago
AWS OFFICIALUpdated 10 months ago
AWS OFFICIALUpdated 9 months ago