Google has launched Gemma 4, its most capable open model family to date. The four new models are designed to run on a wide range of devices, from smartphones to workstations, and are being released for the first time under a fully open Apache 2.0 license.
These models leverage the same underlying technology as Google's proprietary Gemini 3 and are published under the commercially permissive Apache 2.0 license, granting developers complete control over their data, infrastructure, and models. Previous Gemma versions were subject to a more restrictive Google proprietary license.
According to Google, all Gemma 4 models deliver significant improvements in multi-step reasoning and mathematical tasks. For agentic workflows, they natively support function calling, structured JSON output, and system instructions, enabling autonomous agents to integrate with various tools and APIs.
Gemma 4 comes in four sizes, catering to everything from edge devices to high-end workstations: Effective 2B (E2B), Effective 4B (E4B), a 26B Mixture-of-Experts (MoE) model, and a 31B Dense model. All four models move beyond simple chat, capable of handling complex logic and sophisticated agentic workflows.
| Model | Active parameters | Architecture | Context window | Target hardware | Offline operation | Vision (images/video) | Audio input | Quantized on consumer GPU | Arena AI ranking (open) | Special feature |
|---|---|---|---|---|---|---|---|---|---|---|
| E2B | “effective” 2 billion | - | 128K tokens | Smartphones, Raspberry Pi, Jetson Orin Nano | ✅ | ✅ | ✅ | - | - | Compute and memory efficiency on edge devices |
| E4B | “effective” 4 billion | - | 128K tokens | Smartphones, Raspberry Pi, Jetson Orin Nano | ✅ | ✅ | ✅ | - | - | Compute and memory efficiency on edge devices |
| 26B MoE | 3.8 billion active | MoE | up to 256K tokens | Personal computers, consumer GPUs (quantized), workstations, accelerators | ✅ | ✅ | - | ✅ | #6 | Optimized for latency, 3.8 billion active parameters, fast token generation |
| 31B Dense | - | Dense | up to 256K tokens | Personal computers, consumer GPUs (quantized), workstations, accelerators | ✅ | ✅ | - | ✅ | #3 | Maximum quality, base for fine-tuning |
The 31B model currently holds the 3rd position among all open models worldwide on the Arena AI Text Leaderboard, while the 26B MoE model ranks 6th. Google states that Gemma 4 outperforms models 20 times its size. For developers, this translates to high-performance results with significantly reduced hardware requirements.
| Benchmark | Gemma 4 31B IT Thinking | Gemma 4 26B A4B IT Thinking | Gemma 4 E4B IT Thinking | Gemma 4 E2B IT Thinking | Gemma 3 27B IT |
|---|---|---|---|---|---|
| Arena AI (text) (As of 2/6/24) | 1452 | 1441 | - | - | 1365 |
| MMLU (Multilingual Q&A) (No tools) | 85.2% | 82.6% | 69.4% | 60.0% | 67.6% |
| MMMU Pro (Multimodal reasoning) | 76.9% | 73.8% | 52.6% | 44.2% | 49.7% |
| AIME 2026 (Mathematics) (No tools) | 89.2% | 88.3% | 42.5% | 37.5% | 20.8% |
| LiveCodeBench v6 (Competitive coding problems) | 80.0% | 77.1% | 52.0% | 44.0% | 29.1% |
| GPQA Diamond (Scientific knowledge) (No tools) | 84.3% | 82.3% | 58.6% | 43.4% | 42.4% |
| τ2-bench (Agentic tool use) (Retail) | 86.4% | 85.5% | 57.5% | 29.4% | 6.6% |
The two larger models are designed for workstations and servers. The unquantized bfloat16 weights of the 31B model can fit on a single 80 GB NVIDIA H100 GPU, offering powerful local deployment capabilities.