Ep 17: Injecting Knowledge — Qdrant Deployment & Document Indexing Pipeline
Episode Goal
Build a complete document indexing pipeline that loads PDFs/Markdown into a Qdrant vector database.
graph LR
PDF[📄 PDF] & MD[📝 Markdown] --> Load[📥 Load] --> Split[✂️ Chunk] --> Embed[🧮 Embed] --> QD[💾 Qdrant]
style QD fill:#22c55e,stroke:#16a34a,color:#fff1. Docker Deploy Qdrant
# Add to your docker-compose.yml alongside n8n
services:
qdrant:
image: qdrant/qdrant:latest
container_name: n8n_qdrant
restart: unless-stopped
ports:
- "6333:6333" # REST API
- "6334:6334" # gRPC (high performance)
volumes:
- qdrant_data:/qdrant/storage
networks:
- n8n_network # Same network as n8n
# Verify Qdrant
curl http://localhost:6333/healthz
# Expected: {"title":"qdrant - vector search engine","version":"1.x.x"}
2. Indexing Workflow
graph TB
Trigger[⚡ Manual Trigger] --> Read[📂 Read Files]
Read --> Extract[📄 Extract Text]
Extract --> Meta[⚙️ Set: Add Metadata]
Meta --> Split[✂️ Text Splitter
Chunk: 800, Overlap: 200]
Split --> QD[💾 Qdrant Insert]
subgraph "Sub-nodes"
QD --> Emb[🧮 OpenAI Embeddings]
end
style QD fill:#22c55e,stroke:#16a34a,color:#fff// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// Key configurations
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// Set Node — Add metadata for filtering during retrieval:
// source: {{ $json.fileName }}
// category: "product-docs"
// indexedAt: {{ $now.toISO() }}
// Qdrant Vector Store:
// Mode: Insert Documents
// Collection: "knowledge-base"
// Dimension: 1536 // Must match embedding model!
// ⚠️ Collection auto-created if it doesn't exist
// ⚠️ Dimension mismatch = insert error
3. Data Flow Sequence
sequenceDiagram
participant File as 📄 refund_policy.pdf
participant Extract as 📄 Extract
participant Split as ✂️ Splitter
participant Embed as 🧮 Embedding
participant QD as 💾 Qdrant
File->>Extract: Binary PDF
Extract->>Split: Plain text (3000 chars)
Split->>Split: Split into 5 chunks
loop Each chunk
Split->>Embed: Chunk text
Embed-->>QD: Vector + metadata
end
Note over QD: 5 vector records storedNext Episode
In Ep 18, we build the retrieval side — connecting Qdrant as an Agent Tool so the AI can search the knowledge base in real-time.