Ep 19: Retrieval Refined — Hybrid Search, Re-Ranking & Multi-Query

10 MIN READ | UPDATED: 2026-05-07

Three Pain Points of Basic RAG

graph TB
    P1["😤 Semantic Drift
User says 'return', docs say 'refund'"]
    P2["😤 Noisy Results
Top-K has irrelevant chunks"]
    P3["😤 Single Angle
Complex query needs multi-faceted search"]
    P1 --> S1["✅ Hybrid Search"]
    P2 --> S2["✅ Re-Ranking"]
    P3 --> S3["✅ Multi-Query"]
    style S1 fill:#22c55e,stroke:#16a34a,color:#fff
    style S2 fill:#22c55e,stroke:#16a34a,color:#fff
    style S3 fill:#22c55e,stroke:#16a34a,color:#fff

1. Hybrid Search

Combines vector semantic search + keyword search (BM25) — union of both.

graph TB
    Query["❓ 'How to return items?'"]
    Query --> Semantic["🧮 Vector Search
'return items' ≈ 'refund process'"]
    Query --> Keyword["🔤 BM25 Search
Exact match: 'return items'"]
    Semantic --> Merge["🔗 Merge + Dedup"]
    Keyword --> Merge
    Merge --> Result["📑 Comprehensive results"]
    style Merge fill:#f59e0b,stroke:#d97706,color:#fff

// Qdrant native Hybrid Search with weight tuning:
// vector_weight: 0.7    // Semantic: 70%
// keyword_weight: 0.3   // BM25: 30%
// Tech docs → higher keyword weight (exact terms matter)
// General CS → higher semantic weight (varied user phrasing)

2. Re-Ranking

Fetch Top-20 coarse results, then use a dedicated model to fine-rank for Top-5.

// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// Cohere Re-Rank API in n8n Code Node
// ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
const response = await fetch('https://api.cohere.ai/v1/rerank', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${$env.COHERE_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'rerank-v3.5',
    query: query,
    documents: chunks,
    top_n: 5           // Keep only top 5
  })
});

3. Multi-Query

LLM rewrites the original question into 3-5 sub-queries from different angles.

graph TB
    Original["❓ 'n8n Webhook not working on AWS'"]
    Original --> LLM["🧠 Rewrite into 3 sub-queries"]
    LLM --> Q1["🔍 'n8n webhook configuration'"]
    LLM --> Q2["🔍 'AWS security group port 5678'"]
    LLM --> Q3["🔍 'WEBHOOK_URL environment variable'"]
    Q1 & Q2 & Q3 --> VDB["💾 Qdrant"] --> Merge["🔗 Union + Dedup"]
    style LLM fill:#8b5cf6,stroke:#7c3aed,color:#fff

Optimization Comparison

Strategy	Complexity	Improvement	Extra Cost	Best For
Hybrid	⭐⭐	Recall +30%	Minimal	Technical docs
Re-Rank	⭐⭐⭐	Precision +40%	Cohere API	High-quality needs
Multi-Query	⭐⭐	Coverage +50%	Extra LLM call	Complex questions

Next Episode

Ep 20 builds an enterprise RAG management system with auto-incremental updates, stale document cleanup, and retrieval quality monitoring.

← PREVIOUS LESSON Ep 18: Knowledge as a Weapon — Vector Store Tool & RAG Agent in Action

NEXT LESSON → Ep 20: Knowledge Base Ops — Incremental Updates, Cleanup & Quality Monitoring