Google Embeddings 2 Leads in Multilingual Dense Retrieval and RAG, Open-Source Alternatives Excel in Latency

A recent benchmark study rigorously evaluates Google Embeddings (GE2), a Vertex-AI-hosted bi-encoder with a 2,048-token context and explicit task-type conditioning. The study compares GE2 against five prominent open-source models: BGE-M3, E5-large, Multilingual-E5-large (mE5-L), LaBSE, and Paraphrase-Multilingual-MPNet (mMPNet).

The evaluation encompassed comprehensive metrics across four BEIR subsets, a synthetic Italian RAG corpus, a chunking ablation study considering five token sizes and three distinct strategies, and per-query latency measurements on commodity CPU hardware.

GE2 consistently ranked first across all tasks, achieving an impressive average nDCG@10 of 0.638 on BEIR and 0.282 nDCG@10 on the IT-RAG-Bench. However, its median latency of 231.6 ms makes it approximately 14 times slower than the fastest local models tested.

For applications where sub-100 ms Service Level Agreements (SLAs) are critical, Multilingual-E5-large (mE5-L) emerges as a highly competitive option. On Italian tasks, mE5-L's nDCG was within a mere 0.003 of GE2, yet it delivered a significantly lower latency of 31 ms.

A striking finding concerned LaBSE, which, despite widespread multilingual deployment, scored a low 0.188 average nDCG@10 on BEIR, underperforming all dedicated retrieval models, including mMPNet. Furthermore, chunking experiments revealed that all six models reached saturation at 32-token chunks on the given corpus, with semantic chunking providing measurable gains only when using 16-token chunks.

Google Embeddings 2 Leads in Multilingual Dense Retrieval and RAG, Open-Source Alternatives Excel in Latency

Next Stories to Read

DeepSeek V4 Cost Reduction: Reasonix Achieves 99.82% Cache Hit Rate, Slashing Long Session Costs by 80%

SoftBank Shares Soar to Record Highs, Fueled by OpenAI IPO Speculation and Broader AI Enthusiasm

Google Integrates Emoji Reactions into Gmail for Enhanced Communication Efficiency and Nuance