llama.cpp

by ggml-org

🔓 Open Source C++ 🌍 Global free #ggml

About

Developed by ggml-org, llama.cpp is a powerful open-source C/C++ inference engine designed to run large language and multimodal models with minimal setup. Operating without external dependencies, it leverages the ggml tensor library for state-of-the-art performance locally and in the cloud. Key features include comprehensive integer quantization (1.5-bit to 8-bit), multi-platform hardware acceleration (Metal, CUDA, Vulkan), and hybrid CPU+GPU inference. It natively supports the GGUF format and includes a built-in REST API server and WebUI.

Features

Plain C/C++ architecture without dependencies
Comprehensive 1.5-bit to 8-bit quantization
Multi-backend hardware acceleration
CPU+GPU hybrid inference
Built-in REST API server and WebUI

Supported Platforms

webmobiledesktopiot

Links

📦 GitHub Repository

llama.cpp

About

Features

Supported Platforms

Links

Related Products