Google DeepMind Unleashes Gemma 4: Apache 2.0 Open-Source Model Boasts Strong Multimodal and Coding Capabilities

Google DeepMind has officially released the Gemma 4 series of open models, generating significant attention within the AI community. The family includes four sizes: E2B, E4B, 26B A4B (MoE), and a 31B dense model. The 31B model is available on Hugging Face under an Apache 2.0 license. This licensing change is crucial, as it removes the usage restrictions of previous Gemma releases' custom Google licenses, greatly facilitating commercial deployment.

Gemma 4 exhibits strong performance across various benchmarks. The 31B model scores 89.2% on AIME 2026 without tools, 80% on LiveCodeBench v6, and achieves a Codeforces ELO of 2150. For comparison, Gemma 3 27B scored 110 on the same Codeforces benchmark. Even the smaller E2B model, with only 2.3 billion effective parameters, outperforms Gemma 3 27B on MMLU Pro (67.6% vs 60%), GPQA Diamond (42.4% vs 43.4%), and LiveCodeBench (29.1% vs 44%).

The 31B model is a dense model with 30.7 billion parameters, a 256K token context window, and a hybrid attention mechanism. This mechanism interleaves local sliding window attention (1024-token window) with global attention layers, with the final layer always being global. For long-context tasks, global layers utilize unified Keys and Values with Proportional RoPE (p-RoPE), which is Google's approach to achieving memory efficiency at scale without significantly compromising reasoning quality.

Multimodal support covers text and images, featuring a 550M-parameter vision encoder. The model can process images at variable resolutions using a configurable token budget (70 to 1120 tokens per image). Lower budgets are suitable for speed-sensitive tasks like classification, while higher budgets are used for OCR and document parsing where fine-grained detail is critical. Additionally, the smaller E2B and E4B models support up to 30 seconds of audio input, enabling single-model pipelines for voice applications.

A configurable “thinking mode” is built into Gemma 4. It can be activated by including <|think|> in the system prompt and disabled by removing it. The model outputs its reasoning trace within <|channel>thought [reasoning]<channel|> blocks before the final answer. In multi-turn conversations, it is important to strip the thinking content from the history before the next user turn, as thinking traces are not intended to be passed back.

Coding is a clear strength of Gemma 4. The 31B model's Codeforces ELO of 2150 represents a significant leap for models in the open-weight space of its size. On the r/LocalLLaMA subreddit, a user posted a screenshot showing the 31B ranking above GLM-5 on LMSys, a notable achievement given GLM-5's reputation.

The model is available on Hugging Face and loads via the standard Transformers interface. For text and image inputs, AutoProcessor and AutoModelForCausalLM can be used. For working with images or video (or audio on the E2B/E4B variants), AutoModelForMultimodalLM is recommended.

Google DeepMind Unleashes Gemma 4: Apache 2.0 Open-Source Model Boasts Strong Multimodal and Coding Capabilities

Next Stories to Read

OpenClaw: Building a Secure Local-First AI Agent Runtime with Gateway, Skills, and Controlled Tool Execution

Claude Code vs. Codex CLI: A Direct Comparison of Terminal AI Coding Agents

Google Gemma 4: Apache 2.0 License Opens Doors for Commercial AI Development, Surprising Performance