Episode 3: Inside Claude-Mem's Brain — Data Storage Architecture — The Complete Claude-Mem Hands-On Guide

This episode's scenario: You've been using Claude Code on the blog project for two sessions. The Web UI shows a bunch of Observations. Questions arise: Where is this data stored? What does it look like? Why two databases?

3.1 Physical Storage Location

All Claude-Mem data lives in a single directory on your local drive:

~/.claude-mem/
├── claude-mem.db          # SQLite database (core)
├── settings.json          # Configuration
├── chroma/                # ChromaDB vector database
│   └── ...
└── logs/                  # Runtime logs
    └── worker.log

No data is ever uploaded to the cloud. Your development memory belongs entirely to you.

3.2 SQLite — The Structured Storage Engine

Why SQLite?

Feature	Benefit
Zero config	No database server to install
Single file	Entire DB is one `.db` file — copy it to back up
WAL mode	Write-Ahead Logging enables concurrent reads/writes from Hooks and Worker
Built-in FTS5	Full-text search engine for fast keyword lookups

Four Core Tables

Table 1: `sdk_sessions` — Session Records

A new row is created each time you start a Claude Code session.

SELECT sdk_session_id, project, status, created_at, completed_at
FROM sdk_sessions
ORDER BY created_at DESC LIMIT 5;

┌──────────────────┬─────────────┬───────────┬─────────────────────┐
│ sdk_session_id   │ project     │ status    │ created_at          │
├──────────────────┼─────────────┼───────────┼─────────────────────┤
│ sess_abc123      │ my-blog     │ completed │ 2026-04-21 10:30:00 │
│ sess_def456      │ my-blog     │ completed │ 2026-04-20 14:00:00 │
│ sess_ghi789      │ other-proj  │ completed │ 2026-04-19 09:15:00 │
└──────────────────┴─────────────┴───────────┴─────────────────────┘

Table 2: `observations` — The Heart of Claude-Mem

Every time Claude executes a tool (read file, write file, run command), the Worker generates an Observation.

SELECT id, title, type, tool_name, created_at
FROM observations WHERE session_id = 'sess_abc123'
ORDER BY prompt_number;

┌────┬──────────────────────────────┬──────────┬────────────┐
│ id │ title                        │ type     │ tool_name  │
├────┼──────────────────────────────┼──────────┼────────────┤
│ 1  │ Read prisma schema file      │ discovery│ Read       │
│ 2  │ Add Comment model            │ feature  │ Write      │
│ 3  │ Run prisma migrate           │ change   │ Bash       │
│ 4  │ Fix foreign key error        │ bugfix   │ Write      │
│ 5  │ Choose cascade delete policy │ decision │ Write      │
└────┴──────────────────────────────┴──────────┴────────────┘

Each Observation contains rich fields:

Field	Meaning	Example
`title`	One-line summary	"Fix JWT refresh logic"
`narrative`	Detailed account (third person)	"The developer discovered the refreshToken function was missing..."
`facts`	Extracted facts list	["Uses jsonwebtoken library", "Token TTL is 7 days"]
`concepts`	Related concepts	["JWT", "authentication", "middleware"]
`type`	Category (6 types)	"bugfix"
`tool_name`	Triggering tool	"Write"
`files_read`	Files read	["src/auth/jwt.ts"]
`files_modified`	Files changed	["src/auth/jwt.ts"]

Table 3: `user_prompts` — Your Inputs

Records every message you send to Claude within each session.

SELECT prompt_number, content FROM user_prompts
WHERE session_id = 'sess_abc123' ORDER BY prompt_number;

┌───────────────┬──────────────────────────────────────┐
│ prompt_number │ content                              │
├───────────────┼──────────────────────────────────────┤
│ 1             │ Add a comment feature to the blog    │
│ 2             │ Comments should support Markdown     │
│ 3             │ Fix that foreign key error            │
└───────────────┴──────────────────────────────────────┘

Table 4: `session_summaries` — Session Wrap-Up

Auto-generated structured summary when a session ends.

{
  "request": "Add comment system to blog",
  "investigated": ["Prisma relation model syntax", "Cascade delete strategies"],
  "completed": ["Created Comment model", "Implemented comment API", "Fixed FK constraint"],
  "next_steps": ["Build comment UI", "Implement comment notifications"]
}

Notice next_steps magic — it gets automatically injected into Claude's context on the next session, so the AI knows "what you left unfinished."

3.3 ChromaDB — The Semantic Vector Search Engine

SQLite already has FTS5 full-text search. Why also ChromaDB?

Because keyword search and semantic search are fundamentally different capabilities:

Search Type	Engine	When searching for "login bug"...
Keyword search	SQLite FTS5	✅ Matches records containing "login" and "bug" ❌ Misses "authentication error" records
Semantic search	ChromaDB	✅ Matches "login bug" ✅ Also matches "authentication error" (semantically similar) ✅ Even matches "JWT token expiration issue"

How ChromaDB Works

graph LR
    A["New Observation:
Fix JWT refresh logic"] --> B["Vectorize
(text → number vector)"]
    B --> C["Store in ChromaDB
[0.23, -0.41, 0.87, ...]"]

    D["User search:
login issue"] --> E["Vectorize
(query → number vector)"]
    E --> F["Vector similarity calculation"]
    C --> F
    F --> G["Return closest matches"]

    style A fill:#f59e0b,color:#000
    style D fill:#6366f1,color:#fff
    style G fill:#10b981,color:#fff

In simple terms:

SQLite FTS5 = Exact matching (like Ctrl+F)
ChromaDB = Meaning-aware matching (like a colleague who understands what you're asking)

Together, they form Claude-Mem's Hybrid Search, so you can find relevant history no matter what keywords you use.

3.4 The 6 Observation Types

The Worker automatically classifies each Observation into one of 6 types:

Type	Meaning	Blog Project Example
`decision`	Architecture/design choice	"Chose Prisma over TypeORM"
`bugfix`	Fixed a bug	"Fixed 500 error on duplicate article slug"
`feature`	Built a new feature	"Added article tagging system"
`refactor`	Refactored code	"Migrated routes from pages/ to app/"
`discovery`	Learned something new	"Prisma's findMany doesn't include relations by default"
`change`	General modification	"Updated .env config file"

These type labels enable precise filtering:

search(query="database", type="decision")

3.5 Complete Data Flow

graph TB
    A["Claude executes a tool
(read/write file, run command)"] --> B["PostToolUse Hook fires"]
    B --> C["Worker receives raw tool output"]
    C --> D["Claude Agent SDK
compresses & distills"]
    D --> E["Generates structured Observation
{title, narrative, facts,
concepts, type}"]

    E --> F["SQLite"]
    E --> G["ChromaDB"]

    F --> F1["INSERT into observations table"]
    F --> F2["Update FTS5 index"]

    G --> G1["Text → vector embedding"]
    G --> G2["Store in vector collection"]

    H["Next search"] --> I{"Search type?"}
    I -->|"Exact keyword match"| F
    I -->|"Semantic fuzzy match"| G
    I -->|"Hybrid search (default)"| J["FTS5 + vector
results merged & ranked"]

    style A fill:#6366f1,color:#fff
    style F fill:#10b981,color:#fff
    style G fill:#10b981,color:#fff
    style J fill:#f59e0b,color:#000

Hands-On Exercise

Work on the blog project in Claude Code for one session (e.g., create the database schema)
After the session, query the database directly:

sqlite3 ~/.claude-mem/claude-mem.db
SELECT COUNT(*) FROM sdk_sessions;
SELECT id, title, type FROM observations ORDER BY created_at DESC LIMIT 10;
SELECT request, completed, next_steps FROM session_summaries LIMIT 1;
.quit

Open http://localhost:37777 and compare the Web UI with your SQL results — they should match.

Coming Up Next

Data is stored, but how does it get captured "automatically"? Next episode, we reveal Claude-Mem's 5 lifecycle hooks — the invisible secretary that "takes notes behind your back." We'll trace every step from keypress to database insert with a full sequence diagram.

➡️ Episode 4: Lifecycle Hooks — How Claude-Mem Takes Notes Behind Your Back

Episode 3: Inside Claude-Mem's Brain — Data Storage Architecture