l

llama.cpp

by ggml-org
🔓 Open Source C++ 🌍 Global free #ggml

About

Developed by ggml-org, llama.cpp is a powerful open-source C/C++ inference engine designed to run large language and multimodal models with minimal setup. Operating without external dependencies, it leverages the ggml tensor library for state-of-the-art performance locally and in the cloud. Key features include comprehensive integer quantization (1.5-bit to 8-bit), multi-platform hardware acceleration (Metal, CUDA, Vulkan), and hybrid CPU+GPU inference. It natively supports the GGUF format and includes a built-in REST API server and WebUI.

Features

  • Plain C/C++ architecture without dependencies
  • Comprehensive 1.5-bit to 8-bit quantization
  • Multi-backend hardware acceleration
  • CPU+GPU hybrid inference
  • Built-in REST API server and WebUI

Supported Platforms

webmobiledesktopiot