Ep 19: Retrieval Refined — Hybrid Search, Re-Ranking & Multi-Query

⏱ Est. reading time: 10 min Updated on 4/9/2026

Three Pain Points of Basic RAG

graph TB
    P1["😤 Semantic Drift
User says 'return', docs say 'refund'"] P2["😤 Noisy Results
Top-K has irrelevant chunks"] P3["😤 Single Angle
Complex query needs multi-faceted search"] P1 --> S1["✅ Hybrid Search"] P2 --> S2["✅ Re-Ranking"] P3 --> S3["✅ Multi-Query"] style S1 fill:#22c55e,stroke:#16a34a,color:#fff style S2 fill:#22c55e,stroke:#16a34a,color:#fff style S3 fill:#22c55e,stroke:#16a34a,color:#fff

1. Hybrid Search

Combines vector semantic search + keyword search (BM25) — union of both.

graph TB
    Query["❓ 'How to return items?'"]
    Query --> Semantic["🧮 Vector Search
'return items' ≈ 'refund process'"] Query --> Keyword["🔤 BM25 Search
Exact match: 'return items'"] Semantic --> Merge["🔗 Merge + Dedup"] Keyword --> Merge Merge --> Result["📑 Comprehensive results"] style Merge fill:#f59e0b,stroke:#d97706,color:#fff
// Qdrant native Hybrid Search with weight tuning:
// vector_weight: 0.7    // Semantic: 70%
// keyword_weight: 0.3   // BM25: 30%
// Tech docs → higher keyword weight (exact terms matter)
// General CS → higher semantic weight (varied user phrasing)

2. Re-Ranking

Fetch Top-20 coarse results, then use a dedicated model to fine-rank for Top-5.

// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// Cohere Re-Rank API in n8n Code Node
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
const response = await fetch('https://api.cohere.ai/v1/rerank', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${$env.COHERE_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'rerank-v3.5',
    query: query,
    documents: chunks,
    top_n: 5           // Keep only top 5
  })
});

3. Multi-Query

LLM rewrites the original question into 3-5 sub-queries from different angles.

graph TB
    Original["❓ 'n8n Webhook not working on AWS'"]
    Original --> LLM["🧠 Rewrite into 3 sub-queries"]
    LLM --> Q1["🔍 'n8n webhook configuration'"]
    LLM --> Q2["🔍 'AWS security group port 5678'"]
    LLM --> Q3["🔍 'WEBHOOK_URL environment variable'"]
    Q1 & Q2 & Q3 --> VDB["💾 Qdrant"] --> Merge["🔗 Union + Dedup"]
    style LLM fill:#8b5cf6,stroke:#7c3aed,color:#fff

Optimization Comparison

Strategy Complexity Improvement Extra Cost Best For
Hybrid ⭐⭐ Recall +30% Minimal Technical docs
Re-Rank ⭐⭐⭐ Precision +40% Cohere API High-quality needs
Multi-Query ⭐⭐ Coverage +50% Extra LLM call Complex questions

Next Episode

Ep 20 builds an enterprise RAG management system with auto-incremental updates, stale document cleanup, and retrieval quality monitoring.