Ecosystem overview for everything related to ggml.
Developed by ggml-org, llama.cpp is a powerful open-source C/C++ inference engine designed to run large language and multimodal models with minimal setup. Operating without external dependencies, it leverages the ggml tensor library for state-of-the-art performance locally and in the cloud. Key features include comprehensive integer quantization (1.5-bit to 8-bit), multi-platform hardware acceleration (Metal, CUDA, Vulkan), and hybrid CPU+GPU inference. It natively supports the GGUF format and includes a built-in REST API server and WebUI.