Ep 16: Knowledge Is Power — RAG Architecture & Embedding Fundamentals

⏱ Est. reading time: 11 min Updated on 4/9/2026

Why Agents Need RAG

LLMs have two fatal flaws: knowledge cutoff and no private data access. RAG fixes both.

graph TB
    subgraph "Without RAG"
        Q1["User: What's AgentUpdate's refund policy?"]
        Q1 --> LLM1["🤖 GPT-4o"]
        LLM1 --> A1["Sorry, I don't know ❌"]
    end
    subgraph "With RAG"
        Q2["Same question"]
        Q2 --> Search["🔍 Retrieve from KB
refund_policy.pdf → relevant chunks"] Search --> LLM2["🤖 GPT-4o + context"] LLM2 --> A2["Per our policy, full refund within 30 days... ✅"] end style Search fill:#22c55e,stroke:#16a34a,color:#fff

The RAG Pipeline

graph TB
    subgraph "Phase 1: Indexing (Offline)"
        Doc[📄 Documents] --> Chunk[✂️ Chunking]
        Chunk --> Embed[🧮 Embedding Model]
        Embed --> Store[💾 Vector Database]
    end
    subgraph "Phase 2: Retrieval (Online)"
        Query[❓ User Question] --> QEmbed[🧮 Question → Vector]
        QEmbed --> Sim[📐 Similarity Search]
        Sim --> TopK["📑 Top-K Chunks"]
        TopK --> LLM[🤖 LLM Generates Answer]
    end
    style Embed fill:#8b5cf6,stroke:#7c3aed,color:#fff
    style Sim fill:#f59e0b,stroke:#d97706,color:#fff

1. What Is Embedding?

// "How to get a refund?"    → [0.023, -0.147, 0.892, ...]
// "Return process?"         → [0.021, -0.152, 0.889, ...]  // Very close!
// "What's the weather?"     → [0.567, 0.234, -0.102, ...]  // Very far!

// Cosine similarity:
// cos("refund", "return") ≈ 0.95  → Highly similar ✅
// cos("refund", "weather") ≈ 0.12 → Unrelated ❌
Model Dimensions Price Notes
text-embedding-3-small 1536 $0.02/1M Best value
text-embedding-3-large 3072 $0.13/1M Higher precision
text-embedding-004 (Google) 768 Free Google ecosystem
nomic-embed-text (Ollama) 768 Free Air-gapped

2. Chunking Strategy

// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// Recommended n8n Text Splitter settings
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// Mode: "Recursive Character"
// Chunk Size: 800
// Chunk Overlap: 200    ← Prevents key info from being split at boundaries

3. Vector Database Comparison

Database Deployment Best For Free Tier
Qdrant Docker self-host Dev/Prod Self-hosted free
Pinecone Cloud Production 100K vectors
Supabase Cloud/Self All stages 500MB
In-Memory n8n built-in Dev only

Next Episode

In Ep 17, we build a complete Qdrant + document indexing pipeline to populate a knowledge base from PDFs and Markdown files.