News

Google Gemma 4 Released: Apache 2.0 Licensed, Major Performance and Efficiency Gains Across Four Models

Google Gemma 4 Released: Apache 2.0 Licensed, Major Performance and Efficiency Gains Across Four Models

Google officially released Gemma 4 on April 2, 2026, marking a significant generational leap for open models within their parameter range. Notably, this is the first time a Gemma family model has shipped under the Apache 2.0 license, permitting commercial use without explicit permission. Since its inception, the Gemma family has seen over 400 million downloads and inspired more than 100,000 model variants.

The Gemma 4 family comprises four distinct models, each optimized for different hardware spectrums:

  • E2B: Features effectively 2 billion active parameters. It's designed for resource-constrained devices like smartphones, Raspberry Pi, and Jetson Orin Nano, offering a 128K context window. This model handles images, video, and audio natively, prioritizing battery and memory efficiency.

  • E4B: Possesses effectively 4 billion active parameters, targeting the same hardware as E2B but delivering higher reasoning quality. It's approximately three times slower than E2B but significantly more capable, also supporting images, video, and audio. E4B boasts up to 4x faster performance and 60% less battery consumption compared to prior versions.

  • 26B MoE: A Mixture-of-Experts (MoE) model with 26 billion total parameters, where only 3.8 billion activate during inference. It supports a context window of up to 256K tokens and is ranked 6th among open models on the Arena AI Text Leaderboard. Quantized versions can run on consumer GPUs.

  • 31B Dense: The flagship model, featuring a full dense architecture and a 256K context window. It currently ranks 3rd among open models on Arena AI. The unquantized version fits on a single 80 GB H100 GPU, while quantized versions are runnable on consumer hardware, making it an ideal base for fine-tuning.

A key distinction is that E2B and E4B natively process audio input, a capability not present in the 26B MoE and 31B Dense models. Applications requiring speech recognition should consider the edge models within this family.

Google asserts Gemma 4 outperforms models 20 times its size. Third-party benchmarks from Artificial Analysis provide compelling evidence:

  • Scientific Reasoning (GPQA Diamond): The 31B model achieved an 85.7% score in reasoning mode, placing it second among open models under 40 billion parameters, narrowly behind Qwen3.5 27B (85.8%). The 31B's efficiency is notable, generating roughly 1.2 million output tokens compared to 1.5 million for Qwen3.5 27B for comparable quality. The 26B MoE scored 79.2% on GPQA Diamond, surpassing OpenAI's gpt-oss-120B (76.2%), highlighting a remarkable 94-billion-parameter efficiency gap.

  • Agentic Tool Use (τ2-bench Retail): Performance in multi-step tool use saw a dramatic improvement. The 31B scored 86.4% and the 26B MoE scored 85.5%, a significant leap from Gemma 3 27B's 6.6% on the same benchmark, indicating non-incremental changes in handling agentic tasks.

  • Math and Coding (AIME 2026 & LiveCodeBench v6): Similar substantial gains were observed. On AIME 2026, the 31B and 26B achieved 89.2% and 88.3% respectively, compared to Gemma 3 27B's 20.8%. For LiveCodeBench v6, the 31B scored 80.0% and the 26B scored 77.1%, up from Gemma 3 27B's 29.1%.

  • Edge Model Performance: The E4B model, designed for constrained environments, achieved 52.0% on LiveCodeBench and 58.6% on GPQA Diamond, which is reasonable for its hardware targets.

↗ Read original source