Ep 16: Knowledge Is Power — RAG Architecture & Embedding Fundamentals
Why Agents Need RAG
LLMs have two fatal flaws: knowledge cutoff and no private data access. RAG fixes both.
graph TB
subgraph "Without RAG"
Q1["User: What's AgentUpdate's refund policy?"]
Q1 --> LLM1["🤖 GPT-4o"]
LLM1 --> A1["Sorry, I don't know ❌"]
end
subgraph "With RAG"
Q2["Same question"]
Q2 --> Search["🔍 Retrieve from KB
refund_policy.pdf → relevant chunks"]
Search --> LLM2["🤖 GPT-4o + context"]
LLM2 --> A2["Per our policy, full refund within 30 days... ✅"]
end
style Search fill:#22c55e,stroke:#16a34a,color:#fffThe RAG Pipeline
graph TB
subgraph "Phase 1: Indexing (Offline)"
Doc[📄 Documents] --> Chunk[✂️ Chunking]
Chunk --> Embed[🧮 Embedding Model]
Embed --> Store[💾 Vector Database]
end
subgraph "Phase 2: Retrieval (Online)"
Query[❓ User Question] --> QEmbed[🧮 Question → Vector]
QEmbed --> Sim[📐 Similarity Search]
Sim --> TopK["📑 Top-K Chunks"]
TopK --> LLM[🤖 LLM Generates Answer]
end
style Embed fill:#8b5cf6,stroke:#7c3aed,color:#fff
style Sim fill:#f59e0b,stroke:#d97706,color:#fff1. What Is Embedding?
// "How to get a refund?" → [0.023, -0.147, 0.892, ...]
// "Return process?" → [0.021, -0.152, 0.889, ...] // Very close!
// "What's the weather?" → [0.567, 0.234, -0.102, ...] // Very far!
// Cosine similarity:
// cos("refund", "return") ≈ 0.95 → Highly similar ✅
// cos("refund", "weather") ≈ 0.12 → Unrelated ❌
| Model | Dimensions | Price | Notes |
|---|---|---|---|
text-embedding-3-small |
1536 | $0.02/1M | Best value |
text-embedding-3-large |
3072 | $0.13/1M | Higher precision |
text-embedding-004 (Google) |
768 | Free | Google ecosystem |
nomic-embed-text (Ollama) |
768 | Free | Air-gapped |
2. Chunking Strategy
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// Recommended n8n Text Splitter settings
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// Mode: "Recursive Character"
// Chunk Size: 800
// Chunk Overlap: 200 ← Prevents key info from being split at boundaries
3. Vector Database Comparison
| Database | Deployment | Best For | Free Tier |
|---|---|---|---|
| Qdrant | Docker self-host | Dev/Prod | Self-hosted free |
| Pinecone | Cloud | Production | 100K vectors |
| Supabase | Cloud/Self | All stages | 500MB |
| In-Memory | n8n built-in | Dev only | — |
Next Episode
In Ep 17, we build a complete Qdrant + document indexing pipeline to populate a knowledge base from PDFs and Markdown files.