← All posts

April 19, 2026 · TokenDock Team

Claude Opus 4.7: What Users Are Saying After the Launch

Claude Opus 4.7 is drawing mixed early feedback. Some users see gains in coding and vision, while others report regressions, higher token use, and rough edges.

Anthropic launched Claude Opus 4.7 on April 16, 2026, positioning it as its strongest generally available model for complex reasoning, agentic coding, and higher-resolution vision.

A few days in, the early reaction is clearly mixed.

Some users say Opus 4.7 feels stronger on structured coding work, more literal in instruction-following, and better when given well-defined tasks with higher effort settings. Others say it feels worse than Opus 4.6 in day-to-day use, especially in Claude Code, where complaints include hallucinations, shallow checking, more confirmation loops, and higher token burn.

The most honest reading so far is simple: people are not reacting to Opus 4.7 as a clear across-the-board upgrade. They are reacting to it as a model that may be better in some setups and worse in others.

What Anthropic says changed

Anthropic’s official pitch is straightforward. It says Opus 4.7 improves software engineering, complex multi-step work, and high-resolution vision. The company also introduced a new xhigh effort level, added task budgets in beta, and highlighted new Claude Code features such as /ultrareview and wider auto mode access.

There are two practical changes that matter immediately.

First, Opus 4.7 uses an updated tokenizer, which Anthropic says can map the same input to roughly 1.0x to 1.35x more tokens depending on content type.

Second, Anthropic says Opus 4.7 tends to think more at higher effort levels, especially in later turns of agent-style workflows. In theory that should improve reliability on hard problems. In practice, some users are finding that it also changes the feel and cost of using the model.

The positive feedback

The strongest positive feedback is coming from people who use Claude mainly for carefully scoped coding work, especially when prompts are structured and effort is set high enough.

A recurring positive theme is that Opus 4.7 feels more literal and explicit than Opus 4.6. For some users, that is a benefit. It reduces guesswork, makes the model more predictable, and can improve results in pipelines where the prompt is already tight. Early community testing has also suggested that 4.7 may be better in some code-editing tasks, though not by a huge margin.

Another positive theme is that some users think Opus 4.7 is better in Claude Code than on the main chat product. In other words, people who judge it inside agent-style coding sessions seem more likely to see the upside than people using it as a general-purpose thinking model.

There is also a more official positive case from Anthropic’s launch partners and internal evaluations. Anthropic says Opus 4.7 is stronger on long-running coding workflows, more capable on complex engineering tasks, and better at high-resolution visual work such as reading dense screenshots and detailed diagrams.

That does not prove the real-world experience is uniformly better. But it does help explain why some developers are finding it useful even while others are frustrated.

The negative feedback

The criticism is sharper, and easier to state in plain terms.

The most common complaint is that Opus 4.7 feels less reliable than Opus 4.6 in real use. That complaint shows up in multiple forms:

making assumptions without checking files
declaring work complete without actually verifying output
asking for too many confirmations
doing shallow research before recommending destructive changes
feeling more hallucination-prone in code tasks

A number of fresh GitHub issues in Anthropic’s own Claude Code repo describe exactly those problems. One user says 4.7 hallucinates more than 4.6 and relies on thin information. Another says the model claims a visual task is complete without noticing obviously clipped text. Another reports that the model made a major architectural recommendation after inadequate research. There are also complaints that 4.7 feels too passive or too interruption-heavy, even in auto mode.

That matters because these are not abstract benchmark objections. They are complaints about actual work getting done.

Token costs and workflow friction

Even users who do not hate the model are talking about cost and token behavior.

Anthropic itself warns that the updated tokenizer can increase token counts for the same input. It also says the model may spend more thinking tokens at higher effort levels. A Hacker News discussion about tokenizer costs quickly turned into a broader complaint thread, with some users saying the extra spend does not obviously translate into better outcomes for them.

This does not mean Opus 4.7 is always more expensive in net terms. Anthropic argues that the overall tradeoff is favorable on its internal coding evals. But the early outside reaction shows many users are watching token usage closely and do not yet agree that the extra spend is clearly worth it.

Why some of the disagreement may be real

Part of the split may come from the fact that different users are asking very different things from the same model.

If your workflow is structured coding, explicit instructions, high effort settings, and agent-style execution, you may be more likely to see Opus 4.7 as an improvement.

If your workflow is interactive reasoning, broad research, or open-ended general problem solving, you may be more likely to notice regressions, hesitation, or shallow checking.

There is also a tooling factor. Some early breakage is not about the model’s raw intelligence, but about integration changes around it. For example, a LiteLLM issue says Opus 4.7 changed default handling of visible reasoning summaries, causing silent degradation for callers that expected 4.6-style reasoning output unless they explicitly set a new display option.

From the user side, though, that distinction only goes so far. If a model is harder to integrate cleanly or behaves differently after upgrade, that still affects the real experience.

What the current consensus looks like

There is no clean consensus yet, but there is a pattern.

People who like Opus 4.7 tend to say things like:

it is better when prompted carefully
it is stronger on coding than on casual chat
it is more explicit and controllable
it works better at higher effort levels
it seems improved for vision-heavy tasks and agent workflows

People who dislike it tend to say things like:

it feels worse than 4.6 in everyday use
it checks less and assumes more
it burns too many tokens
it gets stuck in correction loops
it asks for too much confirmation
it does not feel like a clean upgrade

Those two views are not necessarily contradictory. They may simply describe a model that has been tuned harder for certain production use cases while becoming less satisfying in others.

Bottom line

Right now, Claude Opus 4.7 looks like a model with real strengths and real tradeoffs.

The positive case is credible: some users do see gains in structured coding, instruction-following, and higher-effort agent workflows. Anthropic also has a solid official story around coding, vision, and long-running tasks.

The negative case is credible too: there are enough user reports of hallucinations, shallow verification, extra confirmation loops, integration friction, and token-cost concerns that it would be wrong to describe the launch as universally well received.

So the fairest conclusion today is not that Opus 4.7 is a win or a failure.

It is that Opus 4.7 is polarizing because the real experience depends heavily on how you use it.