A recent benchmark study rigorously evaluates Google Embeddings (GE2), a Vertex-AI-hosted bi-encoder with a 2,048-token context and explicit task-type conditioning. The study compares GE2 against five prominent open-source models: BGE-M3, E5-large, Multilingual-E5-large (mE5-L), LaBSE, and Paraphrase-Multilingual-MPNet (mMPNet).
The evaluation encompassed comprehensive metrics across four BEIR subsets, a synthetic Italian RAG corpus, a chunking ablation study considering five token sizes and three distinct strategies, and per-query latency measurements on commodity CPU hardware.
GE2 consistently ranked first across all tasks, achieving an impressive average nDCG@10 of 0.638 on BEIR and 0.282 nDCG@10 on the IT-RAG-Bench. However, its median latency of 231.6 ms makes it approximately 14 times slower than the fastest local models tested.
For applications where sub-100 ms Service Level Agreements (SLAs) are critical, Multilingual-E5-large (mE5-L) emerges as a highly competitive option. On Italian tasks, mE5-L's nDCG was within a mere 0.003 of GE2, yet it delivered a significantly lower latency of 31 ms.
A striking finding concerned LaBSE, which, despite widespread multilingual deployment, scored a low 0.188 average nDCG@10 on BEIR, underperforming all dedicated retrieval models, including mMPNet. Furthermore, chunking experiments revealed that all six models reached saturation at 32-token chunks on the given corpus, with semantic chunking providing measurable gains only when using 16-token chunks.