Skip to content

Learn Agentic AI by Building One — A Hands-On Guide - Lesson 7: A Real Agent — Multiple Tools, Reasoning, and Autonomy

13 minute read
Content level: Intermediate
0

Introduction to how Agentic AI works behind the scene

Now let's put it all together into a proper agent that:

  • Has multiple diverse tools (data queries, statistics, time calculations)
  • Chains tool calls to solve complex questions
  • Maintains conversation memory across questions
  • Shows its reasoning

This is essentially how tools like Kiro, GitHub Copilot agents, and enterprise AI assistants work under the hood.

┌─────────────────────────────────────────────────┐
│              THE AGENTIC AI STACK                │
│                                                  │
│  ┌───────────┐  System prompt = personality      │
│  │  PERSONA  │  (Lesson 2)                       │
│  └─────┬─────┘                                   │
│        ↓                                         │
│  ┌───────────┐  Message history = context        │
│  │  MEMORY   │  (Lesson 4)                       │
│  └─────┬─────┘                                   │
│        ↓                                         │
│  ┌───────────┐  Functions the AI can call        │
│  │  TOOLS    │  (Lesson 5)                       │
│  └─────┬─────┘                                   │
│        ↓                                         │
│  ┌───────────┐  Think → Act → Observe → Repeat   │
│  │  LOOP     │  (Lesson 6)                       │
│  └─────┬─────┘                                   │
│        ↓                                         │
│     AGENT                                        │
└─────────────────────────────────────────────────┘

The Tools

A realistic set of capabilities — search, statistics, resolution time calculations, and a date tool:

import json
import datetime
import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

CASES_DB = [
    {"id": "CASE-001", "subject": "EC2 Instance Rebooted", "severity": "Urgent", "service": "EC2", "account": "account1-production", "created": "2025-12-31", "resolved": "2026-01-10", "owner": "engineer1"},
    {"id": "CASE-002", "subject": "RDS Connection Timeout", "severity": "Urgent", "service": "RDS Proxy", "account": "account2-production", "created": "2025-12-30", "resolved": "2026-03-08", "owner": "engineer2"},
    {"id": "CASE-003", "subject": "EFS Throughput High", "severity": "Urgent", "service": "EFS", "account": "account3-production", "created": "2025-12-30", "resolved": "2026-01-09", "owner": "engineer3"},
    {"id": "CASE-004", "subject": "S3 GuardDuty Finding", "severity": "High", "service": "S3", "account": "account4-production", "created": "2025-12-16", "resolved": "2025-12-20", "owner": "engineer2"},
    {"id": "CASE-005", "subject": "Lambda Memory Limit", "severity": "Low", "service": "Lambda", "account": "account5-innovation", "created": "2025-12-16", "resolved": "2025-12-28", "owner": "engineer3"},
    {"id": "CASE-006", "subject": "EKS Pods Failing", "severity": "Normal", "service": "EKS", "account": "account3-innovation", "created": "2025-12-30", "resolved": "2026-01-10", "owner": "engineer3"},
    {"id": "CASE-007", "subject": "MSK Cluster Stuck in Upgrade", "severity": "Low", "service": "MSK", "account": "account2-non-production", "created": "2025-12-22", "resolved": "2026-01-07", "owner": "engineer4"},
    {"id": "CASE-008", "subject": "Aurora Failover RCA", "severity": "Low", "service": "Aurora", "account": "account2-production", "created": "2025-12-19", "resolved": "2026-01-23", "owner": "engineer4"},
    {"id": "CASE-009", "subject": "IAM Access Key Investigation", "severity": "High", "service": "IAM", "account": "account4-production", "created": "2025-12-16", "resolved": "2025-12-26", "owner": "engineer5"},
    {"id": "CASE-010", "subject": "Bedrock CloudWatch Metrics Missing", "severity": "Normal", "service": "Bedrock", "account": "account3-innovation", "created": "2025-12-16", "resolved": "2026-01-01", "owner": "engineer5"},
]


def search_cases(severity: str = None, service: str = None, account: str = None) -> str:
    """Search and filter support cases."""
    cases = CASES_DB
    if severity:
        cases = [c for c in cases if c["severity"].lower() == severity.lower()]
    if service:
        cases = [c for c in cases if service.lower() in c["service"].lower()]
    if account:
        cases = [c for c in cases if account.lower() in c["account"].lower()]
    return json.dumps(cases, indent=2)


def get_statistics(field: str) -> str:
    """Get aggregated statistics: count by severity, service, account, or owner."""
    counts = {}
    for c in CASES_DB:
        key = c.get(field, "unknown")
        counts[key] = counts.get(key, 0) + 1
    sorted_counts = dict(sorted(counts.items(), key=lambda x: x[1], reverse=True))
    return json.dumps(sorted_counts, indent=2)


def calculate_resolution_time(case_id: str = None, service: str = None, severity: str = None) -> str:
    """Calculate resolution time in days for a specific case or average by service/severity."""
    if case_id:
        case = next((c for c in CASES_DB if c["id"] == case_id), None)
        if not case:
            return f"Case {case_id} not found"
        days = (datetime.date.fromisoformat(case["resolved"]) - datetime.date.fromisoformat(case["created"])).days
        return f"Case {case_id}: {days} days to resolve"

    cases = CASES_DB
    if service:
        cases = [c for c in cases if service.lower() in c["service"].lower()]
    if severity:
        cases = [c for c in cases if c["severity"].lower() == severity.lower()]
    if not cases:
        return "No matching cases"

    times = [(datetime.date.fromisoformat(c["resolved"]) - datetime.date.fromisoformat(c["created"])).days for c in cases]
    return json.dumps({"avg_days": round(sum(times)/len(times), 1), "min_days": min(times), "max_days": max(times), "case_count": len(times)})


def get_current_date() -> str:
    """Get today's date."""
    return datetime.date.today().isoformat()


TOOL_FUNCTIONS = {
    "search_cases": search_cases,
    "get_statistics": get_statistics,
    "calculate_resolution_time": calculate_resolution_time,
    "get_current_date": get_current_date,
}

The Tool Descriptions and System Prompt

TOOLS = [
    {"name": "search_cases", "description": "Search support cases. Filter by severity, service, or account name.",
     "input_schema": {"type": "object", "properties": {"severity": {"type": "string"}, "service": {"type": "string"}, "account": {"type": "string"}}, "required": []}},
    {"name": "get_statistics", "description": "Get case counts grouped by a field: severity, service, account, or owner",
     "input_schema": {"type": "object", "properties": {"field": {"type": "string", "enum": ["severity","service","account","owner"]}}, "required": ["field"]}},
    {"name": "calculate_resolution_time", "description": "Calculate resolution time. Provide case_id for one case, or filter by service and/or severity for averages.",
     "input_schema": {"type": "object", "properties": {"case_id": {"type": "string"}, "service": {"type": "string"}, "severity": {"type": "string", "enum": ["Urgent","High","Normal","Low"]}}, "required": []}},
    {"name": "get_current_date", "description": "Get today's date",
     "input_schema": {"type": "object", "properties": {}, "required": []}},
]

SYSTEM_PROMPT = """You are an expert support operations analyst agent. You help teams understand their support case data.

You have tools to search cases, get statistics, and calculate resolution times. Use them to give accurate, data-driven answers.

When answering:
- Always use tools to get real data — never guess
- If a question needs multiple data points, call multiple tools
- Show your reasoning briefly
- Be concise and actionable"""

The Full Agent Class

This combines everything — memory (Lesson 4), tools (Lesson 5), and the agent loop (Lesson 6):

class Agent:
    """A complete agent with memory, tools, and an autonomous loop."""

    def __init__(self):
        self.messages = []  # Conversation memory

    def run(self, user_input: str) -> str:
        print(f"🧑 You: {user_input}")

        self.messages.append({"role": "user", "content": user_input})
        step = 0

        while step < 10:
            step += 1

            response = bedrock.invoke_model(
                modelId="us.anthropic.claude-sonnet-4-20250514-v1:0",
                contentType="application/json",
                accept="application/json",
                body=json.dumps({
                    "anthropic_version": "bedrock-2023-05-31",
                    "max_tokens": 2048,
                    "system": SYSTEM_PROMPT,
                    "tools": TOOLS,
                    "messages": self.messages,
                }),
            )
            result = json.loads(response["body"].read())

            # Done — extract final answer
            if result["stop_reason"] == "end_turn":
                text_parts = [b["text"] for b in result["content"] if b["type"] == "text"]
                final = "\n".join(text_parts)
                self.messages.append({"role": "assistant", "content": result["content"]})
                print(f"\n🤖 Agent ({step} step{'s' if step > 1 else ''}):\n{final}")
                return final

            # Tool calls needed
            self.messages.append({"role": "assistant", "content": result["content"]})
            tool_results = []

            for block in result["content"]:
                if block["type"] == "tool_use":
                    name, args = block["name"], block["input"]
                    print(f"   🔧 Step {step}: {name}({json.dumps(args)})")
                    output = TOOL_FUNCTIONS[name](**args)
                    print(f"      → {output[:120]}{'...' if len(output) > 120 else ''}")
                    tool_results.append({"type": "tool_result", "tool_use_id": block["id"], "content": output})

            self.messages.append({"role": "user", "content": tool_results})

        return "Reached max steps."

Running the Agent

agent = Agent()

Question 1: Overview

agent.run("Give me a quick overview of our case distribution by severity.")

Output:

🧑 You: Give me a quick overview of our case distribution by severity.

🔧 Step 1: get_statistics({"field": "severity"})
→ {
"Urgent": 3,
"Low": 3,
"High": 2,
"Normal": 2
}

🤖 Agent (2 steps):
Here's your case distribution by severity:

  • Urgent: 3 cases (30%)
  • Low: 3 cases (30%)
  • High: 2 cases (20%)
  • Normal: 2 cases (20%)

Total: 10 cases

Key insights:

  • You have an equal split between Urgent and Low severity cases (60% combined)
  • High and Normal severity cases each represent 20% of your caseload
  • The high proportion of Urgent cases (30%) may indicate either genuine critical issues or potential mis-categorization that should be reviewed

Question 2: Worst resolution time

agent.run("Which service takes the longest to resolve? Show me the numbers.")

Output:

🧑 You: Which service takes the longest to resolve? Show me the numbers.

🔧 Step 1: get_statistics({"field": "service"})
→ {
"EC2": 1,
"RDS Proxy": 1,
"EFS": 1,
"S3": 1,
"Lambda": 1,
"EKS": 1,
"MSK": 1,
"Aurora": 1,
"IAM": 1,...
🔧 Step 2: calculate_resolution_time({"service": "EC2"})
→ {"avg_days": 10.0, "min_days": 10, "max_days": 10, "case_count": 1}
🔧 Step 2: calculate_resolution_time({"service": "RDS Proxy"})
→ {"avg_days": 68.0, "min_days": 68, "max_days": 68, "case_count": 1}
🔧 Step 2: calculate_resolution_time({"service": "EFS"})
→ {"avg_days": 10.0, "min_days": 10, "max_days": 10, "case_count": 1}
🔧 Step 2: calculate_resolution_time({"service": "S3"})
→ {"avg_days": 4.0, "min_days": 4, "max_days": 4, "case_count": 1}
🔧 Step 2: calculate_resolution_time({"service": "Lambda"})
→ {"avg_days": 12.0, "min_days": 12, "max_days": 12, "case_count": 1}
🔧 Step 3: calculate_resolution_time({"service": "EKS"})
→ {"avg_days": 11.0, "min_days": 11, "max_days": 11, "case_count": 1}
🔧 Step 3: calculate_resolution_time({"service": "MSK"})
→ {"avg_days": 16.0, "min_days": 16, "max_days": 16, "case_count": 1}
🔧 Step 3: calculate_resolution_time({"service": "Aurora"})
→ {"avg_days": 35.0, "min_days": 35, "max_days": 35, "case_count": 1}
🔧 Step 3: calculate_resolution_time({"service": "IAM"})
→ {"avg_days": 10.0, "min_days": 10, "max_days": 10, "case_count": 1}
🔧 Step 3: calculate_resolution_time({"service": "Bedrock"})
→ {"avg_days": 16.0, "min_days": 16, "max_days": 16, "case_count": 1}

🤖 Agent (4 steps):
Here are the resolution times by service, ranked from longest to shortest:

Longest Resolution Times:

  1. RDS Proxy: 68 days
  2. Aurora: 35 days
  3. MSK: 16 days
  4. Bedrock: 16 days
  5. Lambda: 12 days
  6. EKS: 11 days
  7. EC2: 10 days (tied)
  8. EFS: 10 days (tied)
  9. IAM: 10 days (tied)
  10. S3: 4 days

Key findings:

  • RDS Proxy takes by far the longest at 68 days - nearly double the second-longest service
  • Aurora (35 days) is the second most problematic
  • Database-related services (RDS Proxy, Aurora) dominate the top of the list
  • S3 is your fastest service at just 4 days
  • There's a significant performance gap between your slowest (68 days) and fastest (4 days) services

Recommendation: Focus improvement efforts on RDS Proxy and Aurora cases - they're taking 4-17x longer than your faster services.

Question 3: Follow-up (tests memory)

This one requires the agent to remember the previous answer:

agent.run("For that slowest service, show me the actual cases.")

Output:

🧑 You: For that slowest service, show me the actual cases.

🔧 Step 1: search_cases({"service": "RDS Proxy"})
→ [
{
"id": "CASE-002",
"subject": "RDS Connection Timeout",
"severity": "Urgent",
"service": "RDS Proxy...

🤖 Agent (2 steps):
Here's the RDS Proxy case that's causing the 68-day resolution time:

CASE-002: RDS Connection Timeout

  • Severity: Urgent
  • Account: account2-production
  • Owner: engineer1
  • Created: December 30, 2025
  • Resolved: March 8, 2026
  • Resolution Time: 68 days

Key concerns:

  • This was marked as Urgent severity but took over 2 months to resolve
  • It's a production account case, which makes the long resolution time particularly problematic
  • Connection timeout issues typically should be resolved much faster, especially for urgent cases

Recommendation: This case warrants a post-mortem review to understand why an urgent RDS connectivity issue took 68 days to resolve in a production environment. This could indicate process gaps, resource constraints, or technical complexity that needs addressing.

Question 4: Complex multi-step reasoning

agent.run("Compare production vs non-production accounts. Which has more urgent cases and worse resolution times?")

Output:

🧑 You: Compare production vs non-production accounts. Which has more urgent cases and worse resolution times?

🔧 Step 1: get_statistics({"field": "account"})
→ {
"account2-production": 2,
"account4-production": 2,
"account3-innovation": 2,
"account1-production": 1,
"acc...
🔧 Step 2: search_cases({"severity": "Urgent"})
→ [
{
"id": "CASE-001",
"subject": "EC2 Instance Rebooted",
"severity": "Urgent",
"service": "EC2",
...
🔧 Step 3: search_cases({"account": "account2-production"})
→ [
{
"id": "CASE-002",
"subject": "RDS Connection Timeout",
"severity": "Urgent",
"service": "RDS Proxy...
🔧 Step 3: search_cases({"account": "account4-production"})
→ [
{
"id": "CASE-004",
"subject": "S3 GuardDuty Finding",
"severity": "High",
"service": "S3",
"acc...
🔧 Step 3: search_cases({"account": "account1-production"})
→ [
{
"id": "CASE-001",
"subject": "EC2 Instance Rebooted",
"severity": "Urgent",
"service": "EC2",
...
🔧 Step 3: search_cases({"account": "account3-production"})
→ [
{
"id": "CASE-003",
"subject": "EFS Throughput High",
"severity": "Urgent",
"service": "EFS",
"a...
🔧 Step 4: search_cases({"account": "account3-innovation"})
→ [
{
"id": "CASE-006",
"subject": "EKS Pods Failing",
"severity": "Normal",
"service": "EKS",
"acco...
🔧 Step 4: search_cases({"account": "account5-innovation"})
→ [
{
"id": "CASE-005",
"subject": "Lambda Memory Limit",
"severity": "Low",
"service": "Lambda",
"a...
🔧 Step 4: search_cases({"account": "account2-non-production"})
→ [
{
"id": "CASE-007",
"subject": "MSK Cluster Stuck in Upgrade",
"severity": "Low",
"service": "MSK",
...

🤖 Agent (5 steps):
Here's the comparison between production and non-production accounts:

Case Distribution by Account Type:

Production Accounts (6 cases total):

  • account2-production: 2 cases
  • account4-production: 2 cases
  • account1-production: 1 case
  • account3-production: 1 case

Non-Production Accounts (4 cases total):

  • account3-innovation: 2 cases
  • account5-innovation: 1 case
  • account2-non-production: 1 case

Urgent Cases Analysis:

Production: 3 out of 3 urgent cases (100%)

  • CASE-001: EC2 Instance Rebooted (10 days)
  • CASE-002: RDS Connection Timeout (68 days)
  • CASE-003: EFS Throughput High (10 days)

Non-Production: 0 urgent cases

Resolution Time Analysis:

Production Accounts:

  • Average: ~22 days
  • Range: 4-68 days
  • Worst case: 68 days (RDS Proxy timeout)

Non-Production Accounts:

  • Average: ~13 days
  • Range: 11-16 days
  • More consistent performance

Key Findings:

  1. Production accounts have ALL the urgent cases - this makes sense but shows higher criticality
  2. Production accounts have worse resolution times on average (22 vs 13 days)
  3. Production accounts show higher variability - from 4 days to 68 days
  4. The single worst case (68 days) is in production, which is particularly concerning

Recommendation: Focus improvement efforts on production account case handling, especially urgent cases. The 68-day RDS Proxy case in production needs immediate process review.

What You've Learned

Congratulations — you've built an AI agent from scratch and understood every piece of the stack:

LessonConcept
1LLMs are text-in, text-out functions
2System prompts control AI behavior
3AI can return structured data (JSON)
4Memory = you managing a message list
5Tools = AI requesting YOUR code to run
6Agent loop = Think → Act → Observe → Repeat
7Put it all together = a real agent

Every AI agent uses these same patterns. The difference is just the tools, the prompts, and the scale. The patterns are the same. Once you see them, you can't unsee them.

Previous: Lesson 6 — The agent loop: chaining multiple tool calls automatically →