Ep 13: Retaining Memories — Window Buffer Memory & Conversation State Persistence

⏱ Est. reading time: 10 min Updated on 4/9/2026

Why Agents Need Memory

LLMs are stateless — each call is brand new. Without Memory:

sequenceDiagram
    participant User as 👤 User
    participant AI as 🤖 Memoryless Agent
    User->>AI: My name is Alice
    AI-->>User: Nice to meet you, Alice!
    User->>AI: What's my name?
    AI-->>User: Sorry, I don't know your name. 😅

Memory nodes inject conversation history into the prompt before each LLM call, making the model "appear" to remember.


1. Window Buffer Memory

Maintains a fixed-size sliding window of the most recent N messages.

graph TB
    subgraph "Window Buffer (size = 6)"
        subgraph "Turn 1"
            M1["👤 Hello"] --> M2["🤖 Hi!"]
        end
        subgraph "Turn 2"
            M3["👤 Weather?"] --> M4["🤖 Sunny 28°C"]
        end
        subgraph "Turn 3"
            M5["👤 Tomorrow?"] --> M6["🤖 Cloudy"]
        end
        subgraph "Turn 4 (window slides!)"
            M7["👤 Day after?"] --> M8["🤖 Rainy"]
        end
    end
    M1 -.->|"❌ Evicted"| Trash[🗑️]
    M2 -.->|"❌ Evicted"| Trash
    style Trash fill:#ef4444,stroke:#dc2626,color:#fff

Window Size Guide

Scenario Window Size Reason
Quick FAQ 4-6 2-3 turns usually enough
Tech Support 10-20 Need problem context
Deep Consultation 30-50 Full conversation needed

2. Session Isolation

graph TB
    CT[💬 Chat Trigger]
    CT -->|"sessionId: alice"| Agent[🤖 AI Agent]
    CT -->|"sessionId: bob"| Agent
    Agent --> Memory[💾 Memory]
    Memory --> S1["📂 alice: [hello, weather...]"]
    Memory --> S2["📂 bob: [help me code...]"]
    style Memory fill:#22c55e,stroke:#16a34a,color:#fff

3. How Memory Injects Into LLM Calls

// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// WITHOUT Memory:
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
const messages = [
  { role: "system", content: "You are an AI assistant" },
  { role: "user", content: "What about tomorrow?" }  // No context!
];

// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// WITH Memory (auto-injected):
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
const messages = [
  { role: "system", content: "You are an AI assistant" },
  { role: "user", content: "Weather in Beijing?" },     // History
  { role: "assistant", content: "Sunny, 28°C" },        // History
  { role: "user", content: "What about tomorrow?" }      // Current
];
// Model now knows "tomorrow" means Beijing weather

Memory Type Comparison

Type Mechanism Pros Cons
Window Buffer Keep last N messages Simple, reliable Loses everything outside window
Token Buffer Truncate by token count Precise context control Slightly complex
Summary LLM summarizes history Retains key points longer May lose details, extra API cost
Vector Store Embed history into vector DB True "long-term memory" Requires vector DB setup

Next Episode

In Ep 14, we equip the Agent with real external Tools — enabling it to not just "talk" but "act": calculate, search, and call APIs.