Vector search underpins most retrieval-augmented generation (RAG) pipelines, but at scale, it becomes prohibitively expensive. Storing 10 million document embeddings in float32 consumes roughly 31GB of RAM, creating significant constraints for development teams running local or on-premise inference.
A new open-source library called Turbovec addresses this issue directly. Written in Rust with Python bindings, it is built on TurboQuant, a quantization algorithm from Google Research. With Turbovec, the same 10-million-document corpus fits into just 4GB of RAM. On ARM hardware, its search speed outperforms FAISS IndexPQFastScan by 12–20%.
TurboQuant was introduced by Google’s research team as a data-oblivious quantizer. It achieves near-optimal distortion rates across all bit-widths and dimensions while requiring zero training and zero passes over the data. Most production-grade quantizers, including FAISS’s Product Quantization (PQ), require a codebook training step, often involving k-means clustering on a representative sample. If the corpus shifts, a full re-index is usually necessary. TurboQuant bypasses this by using analytical properties of rotated vectors instead of data-dependent calibration.
The Turbovec quantization pipeline involves four key steps: (1) Vector normalization, where the norm is stored as a single float and the vector becomes a unit direction on a hypersphere. (2) Random rotation using an orthogonal matrix, causing coordinates to follow a predictable distribution (converging to Gaussian N(0, 1/d) in high dimensions). (3) Lloyd-Max scalar quantization, where optimal bucket boundaries are precomputed mathematically without data passes. (4) Bit-packing, shrinking a 1536-dimensional vector from 6,144 bytes in FP32 to 384 bytes at 2-bit, a 16x compression ratio.
During search, the query is rotated into the same domain, and scoring happens against codebook values using SIMD intrinsics—NEON for ARM and AVX-512BW or AVX2 for x86. This high-throughput approach allows Turbovec to maintain impressive recall and speed. In benchmarks involving 100K vectors and 1,000 queries, Turbovec proves to be a formidable alternative to FAISS for memory-constrained environments.