Open-Source Models Kimi and MiMo Dethrone Proprietary Giants in Coding Proficiency
Moonshot AI and Xiaomi's open-source models have surpassed GPT-5.4 and Claude Opus 4.6 on the rigorous SWE-Bench Pro coding benchmark.
On April 27, 2026, Xiaomi officially open-sourced MiMo V2.5 Pro, marking the second time in a single week that a publicly available model has outperformed the world’s leading proprietary AI systems. This release, following closely on the heels of Moonshot AI’s Kimi K2.6 on April 20, represents a seismic shift in the balance of power between closed-door corporate labs and the open AI community. Both models have successfully claimed the top spots on the SWE-Bench Pro benchmark, a grueling test of an AI's ability to solve real-world software engineering problems.
The New Leaders of SWE-Bench Pro
The SWE-Bench Pro benchmark is widely considered the gold standard for evaluating AI agents. Unlike its predecessor, SWE-Bench Verified, which saw models frequently scoring above 70%, the 'Pro' version consists of 1,865 tasks across 41 professional repositories. It requires AI to manage complex, multi-file modifications and navigate long-horizon challenges while minimizing the risk of data contamination.
In this high-stakes arena, Moonshot AI’s Kimi K2.6 achieved a record-breaking score of 58.6%. Xiaomi’s MiMo V2.5 Pro followed closely with 57.2%. These figures are particularly striking when compared to the industry’s most prominent proprietary models: OpenAI’s GPT-5.4 scored 57.7%, while Anthropic’s Claude Opus 4.6 managed 53.4%. Additionally, unconfirmed reports suggest Google’s Gemini 3.1 Pro trails at 54.2%.

Architectural Prowess and Scalability
Kimi K2.6 and MiMo V2.5 Pro are both built on Mixture-of-Experts (MoE) architectures, allowing them to maintain massive total parameter counts while remaining efficient during execution. Kimi K2.6 is a 1-trillion parameter model with 32 billion active parameters and a 262K context window. According to Moonshot AI's Hugging Face model card, "Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration."

Xiaomi’s MiMo V2.5 Pro pushes these technical boundaries even further. It boasts 1.02 trillion total parameters with 42 billion active parameters and supports a massive context window of up to 1 million tokens. Xiaomi stated in its official documentation that the model is "designed for the most demanding agentic, complex software engineering, and long-horizon tasks," noting its ability to sustain trajectories involving thousands of tool calls with high coherence.

Efficiency and Cost Disruptions
Beyond raw performance, these open-source entrants are aggressively undercutting the pricing models of proprietary giants. Kimi K2.6’s API is priced at $0.60 per million input tokens and $2 per million output tokens. MiMo V2.5 Pro is slightly higher at $1 per million input and $3 per million output tokens, but still offers a significant discount compared to the tiered pricing of closed-source competitors.
Furthermore, MiMo V2.5 Pro has demonstrated superior token efficiency. In benchmarks like ClawEval, it reportedly uses 40% to 60% fewer tokens than comparable proprietary models to achieve similar results. For developers, the ability to deploy these "open-weight" models locally means eliminating per-token fees entirely, provided they have the necessary hardware. This provides a level of control and privacy that proprietary APIs cannot match.

Expert Perspective: A Turning Point for Developers
For many in the research community, these results validate years of open-source development. Atal Upadhyay, an AI researcher and blogger, noted that while the industry has been characterized by "magical" demos that often fail in production, the new wave of models feels different. "Then Xiaomi dropped MiMo v2.5 Pro," Upadhyay remarked. "For years, I've watched the AI landscape evolve with a mix of excitement and skepticism... This time, it’s different."
This sentiment is echoed by the emergence of other high-performing open-weight models like Zhipu AI's GLM-5.1, which achieved a 58.4% on SWE-Bench Pro, and Google's own Gemma 4, which continues to provide a strong foundation for multimodal and reasoning applications.
The Future of Agentic AI
The success of Kimi and MiMo highlights a shift toward "agentic" AI—systems designed not just to chat, but to act. Kimi K2.6 is specifically noted for its ability to coordinate "swarms" of up to 300 sub-agents to tackle massive software projects. As these capabilities become open-source, the barrier to entry for building complex, automated software engineering tools is effectively collapsing.
Enterprises and individual developers now have access to frontier-level capabilities without being tethered to a specific provider's ecosystem. This democratization is expected to accelerate development cycles and reduce operational expenditures globally. As the gap between proprietary and open-source models vanishes, the focus of the AI industry may shift from who has the largest model to who can deploy the most efficient and specialized agentic swarms.
