Ep 16: Knowledge Is Power — RAG Architecture & Embedding Fundamentals

11 MIN READ | UPDATED: 2026-05-07

Why Agents Need RAG

LLMs have two fatal flaws: knowledge cutoff and no private data access. RAG fixes both.

graph TB
    subgraph "Without RAG"
        Q1["User: What's AgentUpdate's refund policy?"]
        Q1 --> LLM1["🤖 GPT-4o"]
        LLM1 --> A1["Sorry, I don't know ❌"]
    end
    subgraph "With RAG"
        Q2["Same question"]
        Q2 --> Search["🔍 Retrieve from KB
refund_policy.pdf → relevant chunks"]
        Search --> LLM2["🤖 GPT-4o + context"]
        LLM2 --> A2["Per our policy, full refund within 30 days... ✅"]
    end
    style Search fill:#22c55e,stroke:#16a34a,color:#fff

The RAG Pipeline

graph TB
    subgraph "Phase 1: Indexing (Offline)"
        Doc[📄 Documents] --> Chunk[✂️ Chunking]
        Chunk --> Embed[🧮 Embedding Model]
        Embed --> Store[💾 Vector Database]
    end
    subgraph "Phase 2: Retrieval (Online)"
        Query[❓ User Question] --> QEmbed[🧮 Question → Vector]
        QEmbed --> Sim[📐 Similarity Search]
        Sim --> TopK["📑 Top-K Chunks"]
        TopK --> LLM[🤖 LLM Generates Answer]
    end
    style Embed fill:#8b5cf6,stroke:#7c3aed,color:#fff
    style Sim fill:#f59e0b,stroke:#d97706,color:#fff

1. What Is Embedding?

// "How to get a refund?"    → [0.023, -0.147, 0.892, ...]
// "Return process?"         → [0.021, -0.152, 0.889, ...]  // Very close!
// "What's the weather?"     → [0.567, 0.234, -0.102, ...]  // Very far!

// Cosine similarity:
// cos("refund", "return") ≈ 0.95  → Highly similar ✅
// cos("refund", "weather") ≈ 0.12 → Unrelated ❌

Model	Dimensions	Price	Notes
`text-embedding-3-small`	1536	$0.02/1M	Best value
`text-embedding-3-large`	3072	$0.13/1M	Higher precision
`text-embedding-004` (Google)	768	Free	Google ecosystem
`nomic-embed-text` (Ollama)	768	Free	Air-gapped

2. Chunking Strategy

// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// Recommended n8n Text Splitter settings
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// Mode: "Recursive Character"
// Chunk Size: 800
// Chunk Overlap: 200    ← Prevents key info from being split at boundaries

3. Vector Database Comparison

Database	Deployment	Best For	Free Tier
Qdrant	Docker self-host	Dev/Prod	Self-hosted free
Pinecone	Cloud	Production	100K vectors
Supabase	Cloud/Self	All stages	500MB
In-Memory	n8n built-in	Dev only	—

Next Episode

In Ep 17, we build a complete Qdrant + document indexing pipeline to populate a knowledge base from PDFs and Markdown files.

← PREVIOUS LESSON Ep 15: The Secret of Intent — Prompt Engineering & Tool Description Mastery

NEXT LESSON → Ep 17: Injecting Knowledge — Qdrant Deployment & Document Indexing Pipeline