Skip to content

Learn Agentic AI by Building One — A Hands-On Guide - Lesson 4: Memory — Giving AI Conversation History

4 minute read
Content level: Intermediate
0

Introduction to how Agentic AI works behind the scene

By default, each LLM call is STATELESS. The AI has no memory. If you ask "What's my name?" it has no idea — every call starts fresh.

To create a "conversation", YOU manage the history by sending all previous messages with each new call:

Call 1: messages = [{user: "I'm Alice"}]
Call 2: messages = [{user: "I'm Alice"}, {assistant: "Hi Alice!"}, {user: "What's my name?"}]

The AI doesn't "remember" — you're just giving it the full script each time. This is how ChatGPT and every AI chatbot works.

┌─────────────────────────────────────┐
│  Your code manages the history      │
│  [msg1, msg2, msg3, ...] → LLM     │
│  LLM sees ALL messages each time    │
└─────────────────────────────────────┘

The Code

Here's a simple Conversation class. The self.messages list IS the memory:

import json
import boto3

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")


class Conversation:
    """A simple conversation manager — this IS the 'memory'."""

    def __init__(self, system_prompt: str = None):
        self.system = system_prompt
        self.messages = []  # ← This list IS the memory

    def say(self, user_message: str) -> str:
        # Add the user's message to history
        self.messages.append({"role": "user", "content": user_message})

        # Send the ENTIRE history to the LLM
        body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 512,
            "messages": self.messages,  # ← All messages, every time
        }
        if self.system:
            body["system"] = self.system

        response = bedrock.invoke_model(
            modelId="us.anthropic.claude-sonnet-4-20250514-v1:0",
            contentType="application/json",
            accept="application/json",
            body=json.dumps(body),
        )
        result = json.loads(response["body"].read())
        assistant_reply = result["content"][0]["text"]

        # Add the AI's reply to history too
        self.messages.append({"role": "assistant", "content": assistant_reply})

        return assistant_reply

Demo: Without Memory vs With Memory

Without memory (each call is isolated):

# Using a simple one-shot call_llm function (no history)

def call_llm_simple(prompt: str) -> str:
    response = bedrock.invoke_model(
        modelId="us.anthropic.claude-sonnet-4-20250514-v1:0",
        contentType="application/json",
        accept="application/json",
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 256,
            "messages": [{"role": "user", "content": prompt}],
        }),
    )
    result = json.loads(response["body"].read())
    return result["content"][0]["text"]

print(call_llm_simple("My favorite color is blue. Remember that."))
print(call_llm_simple("What is my favorite color?"))

Output:

Call 1: Got it! I'll remember that your favorite color is blue. Thanks for sharing that with me.
Call 2: I don't have any information about your favorite color. Could you tell me what it is? I'd be happy to chat about colors once I know which one you prefer!

It doesn't know. Each call is independent.

With memory (conversation history):

chat = Conversation("You are a helpful assistant. Keep responses to 1 sentence.")

exchanges = [
    "My name is Alex and I work on the cloud infrastructure team.",
    "My team handles about 30 support cases per month.",
    "What's my name and what does my team do?",
    "How many cases did I say we handle?",
]

for msg in exchanges:
    reply = chat.say(msg)
    print(f"You: {msg}")
    print(f"AI:  {reply}\n")

print(f"Messages in history: {len(chat.messages)}")

Output:

You: My name is Alex and I work on the cloud infrastructure team.
AI: Nice to meet you Alex! Cloud infrastructure is such an important and dynamic field to work in.

You: My team handles about 30 support cases per month.
AI: That sounds like a manageable but steady workload that keeps your team engaged with a good variety of infrastructure challenges.

You: What's my name and what does my team do?
AI: Your name is Alex and your team works on cloud infrastructure, handling about 30 support cases per month.

You: How many cases did I say we handle?
AI: You said your team handles about 30 support cases per month.

Messages in history: 8

Each call sent ALL previous messages to the LLM. The AI could "remember" because we gave it the full conversation every time.

Key Takeaway

AI has no built-in memory. YOU manage it as a message list. Longer conversations mean more tokens, which means more cost and latency. This is why some chatbots "forget" things in long conversations — they trim the history to save tokens.

Next up: Lesson 5 — The big one: giving AI the ability to use TOOLS →
Previous: Lesson 3 — Getting structured data back (not just free text)→