Ant Group has officially launched Ling-2.6-flash, a new large language model built around fast inference, token efficiency, and real-world agent workloads.
The naming matters here. Before the official launch, the model was tested anonymously on OpenRouter under the name Elephant Alpha. Ant has now confirmed that Elephant Alpha was the test version of Ling-2.6-flash. That means anyone who saw Elephant Alpha trending over the past week was already looking at this model before it had a public name.
Ling-2.6-flash and Elephant Alpha are the same model line
The easiest way to understand the naming is this:
- Ling-2.6-flash is the official model name
- Elephant Alpha was the anonymous pre-release test name
This matters for search and discovery because many people first heard about the model through Elephant Alpha. Now that the official release is out, both names point to the same launch story.
What Ling-2.6-flash is
Ling-2.6-flash is an Instruct model from Ant Group's Ling family.
Ant says the model has:
- 104 billion total parameters
- 7.4 billion active parameters per inference
That means it is using a sparse mixture-of-experts design rather than activating the full model on every token. In practice, the goal is to keep the model strong enough for demanding tasks while staying cheaper and faster to serve than a fully dense model at the same scale.
The architecture direction
Ant is positioning Ling-2.6-flash as part of its broader efficiency-first model strategy.
Recent coverage and the official launch framing say the model is built on a hybrid linear MoE architecture. The point of that design is not just to make the model bigger. It is meant to improve throughput, inference cost, and serving efficiency, especially in longer and more tool-heavy workloads.
That is an important part of the story. Ant is not presenting Ling-2.6-flash as a benchmark-only model. It is presenting it as a model that is meant to work well in production.
Speed and inference efficiency
This is one of the strongest parts of the launch.
Ant says Ling-2.6-flash can reach:
- up to 340 tokens per second on 4× H20 GPUs
- prefill throughput 2.2× that of Nemotron-3-Super
The company is also emphasizing token efficiency, which means the model is supposed to stay competitive without needing extremely long outputs to get there.
That matters because some models improve benchmark scores partly by generating far more tokens. Ant is clearly trying to tell a different story here: Ling-2.6-flash is meant to be useful not only because it is capable, but because it is efficient to run.
Built for agent workloads
The second big theme is agents.
Ant says Ling-2.6-flash was specifically improved for:
- tool calling
- multi-step planning
- task execution
- coding-agent workloads
- longer-horizon agent behavior
The published benchmark list reflects that focus. Ant highlights results on:
- BFCL-V4
- TAU2-bench
- SWE-bench Verified
- Claw-Eval
- PinchBench
These are not generic chat benchmarks. They are much more aligned with structured execution, coding, and tool use.
Pricing and API details
Ant has already opened API access for Ling-2.6-flash.
Published launch coverage says the pricing is:
- $0.10 per 1M input tokens
- $0.30 per 1M output tokens
Reports also say the launch included a one-week free API trial, with continuing free quota available on Ant's own platform afterward.
That pricing is part of the product story. Ant is not only marketing Ling-2.6-flash as technically efficient. It is also pricing it to support high-volume usage.
Why the Elephant Alpha connection matters
The Elephant Alpha name helped the model gain attention before launch.
Official launch coverage says the anonymous OpenRouter test version attracted heavy usage and trended for several consecutive days before Ant revealed the model's real name. That early interest matters because it suggests the model was already getting real traffic and developer curiosity before the formal announcement.
So from a market perspective, Elephant Alpha was the teaser name that built momentum, and Ling-2.6-flash is the official release that explains what people were actually using.
Bottom line
Ling-2.6-flash is Ant Group's newly launched 104B MoE instruct model, and Elephant Alpha was its anonymous pre-release test version.
The launch is interesting for three reasons:
- the model combines 104B total parameters with only 7.4B active parameters
- Ant is pushing a strong message around token efficiency and fast inference
- the model is clearly aimed at agent and coding workloads, not just chat
If you saw Elephant Alpha trending, this is the official identity behind that model.