← All posts

April 23, 2026 · TokenDock Team

MiMo-V2.5-Pro Launches With 1T Parameters, 1M Context, and Strong Agent Benchmarks

MiMo-V2.5-Pro is Xiaomi's flagship 1T-parameter agent model with 42B active parameters, 1M context, and strong long-horizon coding benchmarks.

Xiaomi has released MiMo-V2.5-Pro, the newest flagship model in its MiMo line.

This release is not being positioned as a general chat upgrade. Xiaomi is aiming it at a narrower and more demanding category: agentic work, complex software engineering, and long-horizon execution. In practice, that means tasks where the model has to keep working over many turns, use tools repeatedly, recover from mistakes, and stay coherent across very long contexts.

Model size and core architecture

MiMo-V2.5-Pro is a trillion-parameter model.

According to Xiaomi's official model overview, it uses:

1T total parameters
42B active parameters
1M extended context

That makes it a large sparse model rather than a dense one. The active parameter count matters because it gives a better sense of the compute used per token during inference than the total parameter count alone.

Xiaomi also describes the system as using an efficient architecture, though the public launch materials do not provide a full layer-by-layer technical report in the way some open-weight releases do. So the key confirmed architectural facts at launch are the 1T total / 42B active design and the 1M context target.

Context length and long-horizon positioning

The 1,048,576-token context window is one of the most important published specs.

That places MiMo-V2.5-Pro in the ultra-long-context category and helps explain Xiaomi's focus on long-running agent tasks. A model with a context window at this scale is better suited for:

large codebases
long tool traces
multi-file reasoning
long-running task histories
research and software workflows with many intermediate steps

Xiaomi's launch materials repeatedly emphasize long-horizon coherence, which is a more useful framing than just saying "long context." The point is not only that the model can accept large input. The point is that it is supposed to stay coherent while working through complex tasks over time.

What Xiaomi says the model is built for

The official release describes MiMo-V2.5-Pro as Xiaomi's most capable model to date and highlights three main capability areas:

general agentic capability
complex software engineering
long-horizon tasks

That is a clearer product definition than many model launches. Xiaomi is not trying to sell this as the best model for every use case. It is saying the model is strongest when given hard tasks that require structured execution over many steps.

The company also says MiMo-V2.5-Pro can sustain tasks involving more than a thousand tool calls, with improved instruction following inside agentic scenarios and stronger coherence over ultra-long contexts.

Real task examples from the launch

The most interesting part of Xiaomi's release is that it did not stop at headline benchmarks. It also published several concrete long-run tasks.

1. SysY compiler in Rust

Xiaomi says MiMo-V2.5-Pro implemented a complete SysY compiler in Rust based on a Peking University compiler course project.

The task included:

lexer
parser
AST
Koopa IR code generation
RISC-V assembly backend
performance optimization

According to Xiaomi, the model completed the task in:

4.3 hours
672 tool calls
233/233 on the hidden test suite

That is one of the strongest concrete examples in the launch because it is not just a benchmark score. It is a full autonomous engineering run with a measurable outcome.

2. Full desktop video editor

Xiaomi also says MiMo-V2.5-Pro built a working desktop video editor with:

multi-track timeline
clip trimming
cross-fades
audio mixing
export pipeline

The published figures for that run are:

8,192 lines of code
1,868 tool calls
11.5 hours of autonomous work

Again, the main point is not only the final artifact. It is that Xiaomi is stressing long autonomous runs rather than single-turn code generation.

3. Analog EDA optimization

A third example in the launch covers an analog circuit design task: an FVF-LDO design and optimization loop in TSMC 180nm CMOS.

Xiaomi says MiMo-V2.5-Pro was connected to an ngspice simulation loop and iterated toward a design that satisfied all target metrics, including:

phase margin
line regulation
load regulation
quiescent current
PSRR
transient response

This is a notable example because it moves beyond ordinary coding and into model-plus-simulator closed-loop engineering.

Benchmark results

The launch and related Xiaomi materials highlight several benchmark numbers for MiMo-V2.5-Pro.

Published figures include:

SWE-bench Pro: 57.2
Claw-Eval: 63.8
\u03c43-Bench: 72.9

Xiaomi's homepage and related summaries also position the model as competitive in ClawEval, GDPVal, and SWE-bench Pro, with the company describing it as approaching Claude Opus 4.6 in demanding agentic scenarios.

A separate public model listing also reports:

Artificial Analysis Intelligence Index: 54

That matters because it places MiMo-V2.5-Pro in the top tier of current general-purpose frontier models, while the official Xiaomi framing stays focused on agentic and long-horizon workloads rather than broad consumer chat.

Coding and agent-specific evaluation

Xiaomi also references its internal MiMo Coding Bench, which it describes as an in-house evaluation suite for coding inside agent frameworks such as Claude Code.

According to Xiaomi's launch materials, this benchmark covers:

repo understanding
project building
code review
structured artifact generation
planning
software engineering tasks

The company's claim is that MiMo-V2.5-Pro narrows the gap with Claude Opus 4.6 on this benchmark and improves the user experience in real-world coding scenarios.

While Xiaomi has not published a full public technical report for MiMo Coding Bench, the benchmark description itself is useful because it shows what the model was optimized for: not isolated code snippets, but framework-driven development workflows.

Token efficiency

One more technical point in the launch is token efficiency.

Xiaomi says MiMo-V2.5-Pro reaches frontier-tier capability while spending significantly fewer tokens per trajectory on ClawEval. The specific published claim is that it reaches 64% Pass\u00b3 on ClawEval using about 70K tokens per trajectory, which Xiaomi says is roughly 40\u201360% fewer tokens than several leading frontier models at comparable capability levels.

That is an important detail because long-horizon agents are expensive. A model that reaches similar results with fewer tokens has a real systems advantage, even before pricing is considered.

What is still unclear

Even though Xiaomi has published more concrete examples than many model launches, some technical details are still missing from the public materials.

At the time of writing, Xiaomi has not published a full public technical report for MiMo-V2.5-Pro that would answer questions such as:

exact MoE routing structure
layer counts
hidden dimensions
attention design
tokenizer changes
training data mix
post-training recipe details
native multimodality status for the Pro model itself

So the current public picture is strongest on:

parameter scale
active parameter count
context length
benchmark positioning
long-horizon task examples

and weaker on the lower-level architectural breakdown that researchers often want.

Bottom line

MiMo-V2.5-Pro is Xiaomi's new flagship agent model, built around 1T total parameters, 42B active parameters, and a 1M-token context window.

The published launch materials make three things clear:

Xiaomi optimized it for agentic execution and long-horizon coherence, not just chat.
The model is being judged heavily on software engineering and tool-use tasks.
Xiaomi wants MiMo-V2.5-Pro to be seen as a frontier-class model for autonomous, multi-step work rather than only single-turn reasoning.

The most important technical facts are the 1T / 42B scale, the 1M context, and the benchmark plus task evidence showing that Xiaomi is pushing hard into the long-running agent category.