Anthropic Ships Claude Opus 4.7 as Mythos Stays Beneath Lock and Key

0
6
Anthropic Ships Claude Opus 4.7 as Mythos Stays Beneath Lock and Key

Anthropic on Thursday launched Claude Opus 4.7, its most succesful industrial AI mannequin but — and spent a lot of the launch reminding everybody it has a greater one locked within the vault.

Anthropic has launched Claude Opus 4.7, the newest iteration of its flagship industrial AI mannequin, extending its lead over OpenAI and Google in agentic coding whereas quietly drawing a tough line between what the corporate will promote to the general public and what it is not going to.

The brand new mannequin is pitched squarely at builders. Anthropic says Opus 4.7 is a “notable enchancment” on Opus 4.6 in superior software program engineering, with customers reporting the power handy off their hardest coding work — duties that beforehand wanted shut human supervision — with confidence. It handles long-running jobs with extra rigor, follows directions extra actually, and devises methods to confirm its personal outputs earlier than reporting again.

Pricing is unchanged from Opus 4.6 at US$5 per million enter tokens and US$25 per million output tokens. The mannequin is dwell throughout Anthropic’s API, Amazon Bedrock, Google Cloud’s Vertex AI and Microsoft Foundry, and has already rolled out to GitHub Copilot for Professional+, Enterprise and Enterprise customers.

A benchmark lead, however a good one

Opus 4.7 narrowly retakes the highest spot amongst publicly obtainable frontier fashions, outscoring OpenAI’s GPT-5.4 and Google’s Gemini 3.1 Professional on key benchmarks protecting agentic coding, scaled software use, agentic pc use and monetary evaluation. VentureBeat’s Carl Franzen famous that on instantly comparable benchmarks Opus 4.7 solely leads GPT-5.Four seven-to-four, a reminder that the hole between the most important labs is shrinking quick.

 

claude opus

Present AI Mannequin Benchmarks, Supply: Anthropic

The mannequin takes the highest spot on SWE-bench Professional and SWE-bench Verified, the headline assessments for dealing with advanced engineering work. Early-access testers cited by Anthropic reported outsized beneficial properties on their very own inner evaluations. Cursor co-founder Michael Truell mentioned the mannequin cleared 70 per cent on CursorBench, versus 58 per cent for Opus 4.6. XBOW chief government Oege de Moor reported a bounce from 54.5 per cent to 98.5 per cent on the agency’s visual-acuity benchmark — a change that, in his framing, successfully eliminates a long-standing ache level for autonomous penetration testing. Rakuten’s Yusuke Kaji mentioned the mannequin resolved 3 times extra manufacturing duties than its predecessor on the Japanese conglomerate’s inner SWE-Bench fork.

Imaginative and prescient is the opposite headline improve. Opus 4.7 can course of photos as much as 2,576 pixels on the lengthy edge, greater than 3 times the decision of prior Claude fashions. The change opens the door to make use of circumstances that rely upon superb visible element, together with computer-use brokers parsing dense screenshots and structured information extraction from advanced technical diagrams.

The weaknesses Anthropic flags itself

The discharge notes are unusually candid about the place Opus 4.7 falls quick. The mannequin doesn’t sweep each class: GPT-5.Four nonetheless leads in agentic search, multilingual query answering, and a few terminal-based coding duties. Opus 4.7 additionally scored fractionally decrease than Opus 4.6 on cybersecurity vulnerability replica, at 73.1 per cent versus 73.eight per cent, a regression Anthropic attributes to its new automated cyber safeguards.

Migration is just not frictionless both. The mannequin makes use of an up to date tokenizer that may map the identical enter to 1.0–1.35 instances as many tokens as Opus 4.6, and it thinks more durable at larger effort ranges, producing extra output tokens on later turns in agentic workflows. Builders could have to re-tune prompts written for earlier fashions, as a result of Opus 4.7 takes directions actually the place its predecessors interpreted them loosely.

Anthropic’s personal alignment evaluation charges the mannequin “largely well-aligned and reliable, although not absolutely excellent in its behaviour.” On measures akin to honesty and resistance to prompt-injection assaults, Opus 4.7 improves on Opus 4.6. On others — together with a bent to present overly detailed harm-reduction recommendation on managed substances — it’s modestly weaker.

The Mythos shadow

The extra revealing subtext to Thursday’s launch is what Anthropic is just not delivery. The corporate repeatedly positions Opus 4.7 as “much less broadly succesful than our strongest mannequin, Claude Mythos Preview” — the frontier system unveiled earlier this month underneath Project Glasswing and restricted to round 40 vetted enterprise and authorities companions.

As BNC reported last week, Mythos is a system Anthropic believes can autonomously uncover and exploit zero-day software program vulnerabilities at a scale that exceeds each human researchers and each automated software in existence. The corporate is preserving it inside a managed coalition that features Apple, Google, Microsoft, Amazon Net Providers, CrowdStrike and JPMorgan Chase. Opus 4.7, in contrast, has been intentionally skilled with decreased cyber capabilities and ships with safeguards that mechanically detect and block requests flagged as prohibited or high-risk cybersecurity use circumstances.

Gizmodo’s Jake Peterson learn the framing bluntly, observing that the Opus 4.7 announcement successfully doubles as advertising and marketing for the system Anthropic refuses to promote. Legit safety researchers can apply for broader entry by means of a brand new Cyber Verification Program, which Anthropic is pitching because the managed on-ramp for vulnerability analysis, penetration testing and red-teaming work.

The twin-track technique issues past the AI trade. Bitcoin was buying and selling close to US$74,500 on the time of the Opus 4.7 launch, regular contained in the vary it has held because the early-April Mythos disclosure. The roughly US$200 billion locked in sensible contracts throughout Ethereum, Solana and different chains sits behind friction-based defences — audits, timelocks, multisig governance — that Anthropic has itself warned change into “significantly weaker” towards model-assisted adversaries.

What builders get at present

Alongside Opus 4.7, Anthropic rolled out a brand new “xhigh” effort stage sitting between excessive and max, giving builders finer management over the trade-off between reasoning depth and latency. Activity budgets entered public beta on the Claude Platform, letting builders cap token spend on autonomous brokers to forestall runaway payments on long-running jobs. In Claude Code, a brand new /ultrareview slash command runs a devoted overview session that flags bugs and design problems with the type a cautious senior reviewer would catch, and the corporate’s “auto mode” — which lets Claude act with out fixed permission prompts — has been prolonged to Max plan subscribers.

For builders weighing the improve, Anthropic’s suggestion is to start out with excessive or xhigh effort for coding and agentic use circumstances, measure token utilization on actual site visitors, and seek the advice of the migration guide earlier than rolling the mannequin into manufacturing harnesses. The headline is that frontier functionality retains arriving on a two-month cadence, at unchanged costs, whereas the model Anthropic considers genuinely transformative stays behind closed doorways.

Troy Miller Troy Miller Read More