Gemma 4: 31B Dense Model Ranks #3 Among All Open LLMs
Google released Gemma 4 on April 2, 2026 — four open-weight models under the Apache 2.0 license, ranging from an efficient 2.3B model to a 31B dense model that currently ranks #3 among all open LLMs on the Arena AI text leaderboard. The 31B model scores 80% on LiveCodeBench v6, 84.3% on GPQA Diamond, and 89.2% on the AIME 2026 mathematics benchmark — results that until recently required far larger closed models.
How Gemma 4's Four Model Sizes Break Down
Gemma 4 ships in four configurations designed to cover different hardware and latency requirements. The E2B model has 2.3 billion effective parameters and is designed for on-device deployment with native audio input for speech recognition and understanding. The E4B targets 4 billion effective parameters in a similar form factor. The 26B Mixture-of-Experts model activates only 3.8 billion parameters at inference time, making it dramatically faster per token than its total parameter count implies — it ranks #6 on Arena AI despite firing a small fraction of its weights on each forward pass. The 31B dense model is the flagship, achieving the top-three open model rank and hitting coding, math, and graduate science benchmarks at levels previously exclusive to frontier closed models.
All four models natively process video and images in addition to text, and they handle variable image resolutions rather than locking to fixed patch sizes. The E2B and E4B add native audio input for speech recognition — a capability that is unusual at that parameter class. For developers building multimodal applications on consumer or edge hardware, the smaller Gemma 4 variants provide a capable entry point that does not require expensive cloud inference to run.
The Apache 2.0 License and What It Actually Allows
The Apache 2.0 license on all four Gemma 4 models is a meaningful detail that separates them from other prominent open-weight releases. Unlike Meta's Llama license, which restricts commercial use above certain usage thresholds and prohibits training competing models, Apache 2.0 imposes no usage limits, no user count restrictions, and no constraints on fine-tuning for competing applications. Companies can fine-tune Gemma 4 on proprietary data, redistribute modified versions, and use model outputs commercially without paying royalties or filing notices with Google.
This matters at the level of enterprise adoption and legal review. A model you can fully own and modify in production sits in a different category from one where commercial terms depend on traffic volume. For procurement and legal teams that have been cautious about Llama-based deployments, Gemma 4's Apache 2.0 license removes friction that held back internal deployment at scale. The combination of performance and permissive licensing in a single release is unusual and should accelerate enterprise evaluation significantly.
What Gemma 4 Means for Open Model Evaluations in 2026
The 31B dense model's Arena AI rank puts it ahead of several models with far larger total parameter counts. The 26B MoE achieves competitive quality with a per-token compute cost closer to a 4 billion parameter model, which changes inference economics for teams running large request volumes. If your workloads are latency-sensitive, the 26B MoE is the configuration to evaluate first. If you are optimizing for reasoning quality — coding tasks, math, or multi-step analysis — the 31B dense is the right comparison point.
For teams evaluating open models in April 2026, Gemma 4 is the strongest starting point in most weight classes. All weights are available on Hugging Face with documented training datasets and recipes. The benchmark-to-production gap still requires domain-specific evaluation — standardized scores correlate well with real task performance but not perfectly. Run representative workloads before committing to infrastructure decisions.
KEY POINTS: - Gemma 4 31B ranks #3 among all open LLMs on Arena AI as of April 2026 - 26B MoE activates only 3.8B parameters at inference — competitive quality, much faster - 89.2% on AIME 2026 math, 80% on LiveCodeBench v6, 84.3% on GPQA Diamond - All four sizes released under Apache 2.0 — no usage limits, no royalties - Natively multimodal: video, images, and audio input across all model sizes - E2B and E4B support on-device deployment with native audio input capability