Xiaomi has released MiMo-V2.5-Pro, the newest flagship model in its MiMo line.
This release is not being positioned as a general chat upgrade. Xiaomi is aiming it at a narrower and more demanding category: agentic work, complex software engineering, and long-horizon execution. In practice, that means tasks where the model has to keep working over many turns, use tools repeatedly, recover from mistakes, and stay coherent across very long contexts.
Model size and core architecture
MiMo-V2.5-Pro is a trillion-parameter model.
According to Xiaomi's official model overview, it uses:
- 1T total parameters
- 42B active parameters
- 1M extended context
That makes it a large sparse model rather than a dense one. The active parameter count matters because it gives a better sense of the compute used per token during inference than the total parameter count alone.
Xiaomi also describes the system as using an efficient architecture, though the public launch materials do not provide a full layer-by-layer technical report in the way some open-weight releases do. So the key confirmed architectural facts at launch are the 1T total / 42B active design and the 1M context target.
Context length and long-horizon positioning
The 1,048,576-token context window is one of the most important published specs.
That places MiMo-V2.5-Pro in the ultra-long-context category and helps explain Xiaomi's focus on long-running agent tasks. A model with a context window at this scale is better suited for:
- large codebases
- long tool traces
- multi-file reasoning
- long-running task histories
- research and software workflows with many intermediate steps
Xiaomi's launch materials repeatedly emphasize long-horizon coherence, which is a more useful framing than just saying "long context." The point is not only that the model can accept large input. The point is that it is supposed to stay coherent while working through complex tasks over time.
What Xiaomi says the model is built for
The official release describes MiMo-V2.5-Pro as Xiaomi's most capable model to date and highlights three main capability areas:
- general agentic capability
- complex software engineering
- long-horizon tasks
That is a clearer product definition than many model launches. Xiaomi is not trying to sell this as the best model for every use case. It is saying the model is strongest when given hard tasks that require structured execution over many steps.
The company also says MiMo-V2.5-Pro can sustain tasks involving more than a thousand tool calls, with improved instruction following inside agentic scenarios and stronger coherence over ultra-long contexts.
Real task examples from the launch
The most interesting part of Xiaomi's release is that it did not stop at headline benchmarks. It also published several concrete long-run tasks.
1. SysY compiler in Rust
Xiaomi says MiMo-V2.5-Pro implemented a complete SysY compiler in Rust based on a Peking University compiler course project.
The task included:
- lexer
- parser
- AST
- Koopa IR code generation
- RISC-V assembly backend
- performance optimization
According to Xiaomi, the model completed the task in:
- 4.3 hours
- 672 tool calls
- 233/233 on the hidden test suite
That is one of the strongest concrete examples in the launch because it is not just a benchmark score. It is a full autonomous engineering run with a measurable outcome.
2. Full desktop video editor
Xiaomi also says MiMo-V2.5-Pro built a working desktop video editor with:
- multi-track timeline
- clip trimming
- cross-fades
- audio mixing
- export pipeline
The published figures for that run are:
- 8,192 lines of code
- 1,868 tool calls
- 11.5 hours of autonomous work
Again, the main point is not only the final artifact. It is that Xiaomi is stressing long autonomous runs rather than single-turn code generation.
3. Analog EDA optimization
A third example in the launch covers an analog circuit design task: an FVF-LDO design and optimization loop in TSMC 180nm CMOS.
Xiaomi says MiMo-V2.5-Pro was connected to an ngspice simulation loop and iterated toward a design that satisfied all target metrics, including:
- phase margin
- line regulation
- load regulation
- quiescent current
- PSRR
- transient response
This is a notable example because it moves beyond ordinary coding and into model-plus-simulator closed-loop engineering.
Benchmark results
The launch and related Xiaomi materials highlight several benchmark numbers for MiMo-V2.5-Pro.
Published figures include:
- SWE-bench Pro: 57.2
- Claw-Eval: 63.8
- \u03c43-Bench: 72.9
Xiaomi's homepage and related summaries also position the model as competitive in ClawEval, GDPVal, and SWE-bench Pro, with the company describing it as approaching Claude Opus 4.6 in demanding agentic scenarios.
A separate public model listing also reports:
- Artificial Analysis Intelligence Index: 54
That matters because it places MiMo-V2.5-Pro in the top tier of current general-purpose frontier models, while the official Xiaomi framing stays focused on agentic and long-horizon workloads rather than broad consumer chat.
Coding and agent-specific evaluation
Xiaomi also references its internal MiMo Coding Bench, which it describes as an in-house evaluation suite for coding inside agent frameworks such as Claude Code.
According to Xiaomi's launch materials, this benchmark covers:
- repo understanding
- project building
- code review
- structured artifact generation
- planning
- software engineering tasks
The company's claim is that MiMo-V2.5-Pro narrows the gap with Claude Opus 4.6 on this benchmark and improves the user experience in real-world coding scenarios.
While Xiaomi has not published a full public technical report for MiMo Coding Bench, the benchmark description itself is useful because it shows what the model was optimized for: not isolated code snippets, but framework-driven development workflows.
Token efficiency
One more technical point in the launch is token efficiency.
Xiaomi says MiMo-V2.5-Pro reaches frontier-tier capability while spending significantly fewer tokens per trajectory on ClawEval. The specific published claim is that it reaches 64% Pass\u00b3 on ClawEval using about 70K tokens per trajectory, which Xiaomi says is roughly 40\u201360% fewer tokens than several leading frontier models at comparable capability levels.
That is an important detail because long-horizon agents are expensive. A model that reaches similar results with fewer tokens has a real systems advantage, even before pricing is considered.
What is still unclear
Even though Xiaomi has published more concrete examples than many model launches, some technical details are still missing from the public materials.
At the time of writing, Xiaomi has not published a full public technical report for MiMo-V2.5-Pro that would answer questions such as:
- exact MoE routing structure
- layer counts
- hidden dimensions
- attention design
- tokenizer changes
- training data mix
- post-training recipe details
- native multimodality status for the Pro model itself
So the current public picture is strongest on:
- parameter scale
- active parameter count
- context length
- benchmark positioning
- long-horizon task examples
and weaker on the lower-level architectural breakdown that researchers often want.
Bottom line
MiMo-V2.5-Pro is Xiaomi's new flagship agent model, built around 1T total parameters, 42B active parameters, and a 1M-token context window.
The published launch materials make three things clear:
- Xiaomi optimized it for agentic execution and long-horizon coherence, not just chat.
- The model is being judged heavily on software engineering and tool-use tasks.
- Xiaomi wants MiMo-V2.5-Pro to be seen as a frontier-class model for autonomous, multi-step work rather than only single-turn reasoning.
The most important technical facts are the 1T / 42B scale, the 1M context, and the benchmark plus task evidence showing that Xiaomi is pushing hard into the long-running agent category.