Mistral, GLM, and MiniMax: The Models Nobody Expected

The open source LLM landscape in 2025-2026 isn't just Meta, DeepSeek, and Alibaba. There's a second layer of players delivering results under the radar — and in some benchmarks, they surpass even the favorites.

Mistral AI from France, GLM from Tsinghua University in China, and MiniMax are three cases deserving attention from those seriously following the field.

Mistral: French Efficiency with Open License

Mistral AI built its reputation on combining two elements: compact and highly efficient models + Apache 2.0 license — the most permissive for commercial use.

The flagship model in the line is Mixtral 8x22B — 141 billion total parameters, with Mixture of Experts architecture with 8 experts, activating only 2 per token (~39B active). This delivers parameter scale with reduced inference cost.

In benchmarks, Mixtral 8x22B achieves 77.8% on MMLU and 41.8% on HumanEval (coding), positioning itself as a solid generalist model — not the best in any single category, but competitive in all.

Mistral's differentiator isn't at the top of benchmarks. It's in practical viability:

Apache 2.0: unrestricted commercial use, without Meta Community License restrictions
Manageable size: runs on mid-range infrastructure without complex optimizations
Documented fine-tuning: mature ecosystem for customization

For companies needing a base model for proprietary fine-tuning — without license restrictions — Mixtral 8x22B remains one of the safest legal choices.

GLM-4.7 and GLM-5: The Tsinghua Project

GLM (General Language Model) is developed by Z.ai (formerly Zhipu AI), a company born as a spinoff of Tsinghua University in 2019 and today operating independently, valued at approximately US$ 3-4 billion. Recent results grabbed attention.

GLM-4.7 achieves:

AIME 2025 (mathematics olympiad): 95.7% — one of the highest scores recorded
GPQA Diamond (scientific reasoning): 85.7%
LiveCodeBench (real coding): 84.9%
IFEval (instruction following): 88.0%
Context: 200K tokens

These numbers place GLM-4.7 at the top of the open source leaderboard in multiple categories — competing directly with much larger models.

GLM-5, the larger successor, achieved 1451 points on Chatbot Arena — the highest score ever recorded by an open source model on this human preference platform.

MiniMax M2.5: The Software Engineering Specialist

MiniMax M2.5 has a number that no other model on the leaderboard could match: 80.2% on SWE-bench Verified — the benchmark measuring ability to solve real GitHub problems.

For those unfamiliar with SWE-bench: it submits real issues from open source repositories to the model and evaluates whether the model can write a patch that passes automated tests. It's the benchmark closest to real engineering work.

No open source model has ever reached this level. This makes MiniMax M2.5 the strongest choice for autonomous software development agents.

What These Models Have in Common

The three — Mistral, GLM, and MiniMax — represent an important phenomenon: the decentralization of the frontier in AI.

The frontier is no longer concentrated in four or five American laboratories. It's distributed across Tsinghua, Paris, Shanghai, and dozens of other research centers working quietly and launching results that surprise the market.

For datacenters and platform teams, this means model evaluation must go beyond big names. GLM-4.7 with 95.7% on AIME wasn't on almost anyone's radar two years ago.

Conclusion

Mistral, GLM, and MiniMax prove that the race for the best open source LLM is more competitive than popularity rankings suggest.

Following only the most-starred models on GitHub means missing results that, in specific cases, are the best available in any category — open or closed.

Sources:

Published on Hive.blog | #ArtificialInteligence #llm