What security leaders should take from the Fable 5 Release

|
Share

Anthropic shipped Claude Fable 5 this week, the first Mythos-class model anyone can use. I spent the evening with the announcement, the eval data, and the early coverage. This is my read on what the release means if you run a security program.

Some brief context if you haven't been tracking Mythos closely. For the past few months, some of the strongest AI cyber capabilities in existence have been restricted to Project Glasswing — government and critical infrastructure operators, run in partnership with the US government. Everyone else has been reading about it from the outside. Fable 5 is the same underlying model as Mythos 5, made generally available with safeguards in cybersecurity and biology. Mythos 5 itself, with the cyber safeguards lifted, goes to only Glasswing participants. Two different things that many headlines are blurring.

Newsletter CTA

What you actually get with Fable 5

Anthropic published offensive cyber eval data with the launch. For security practitioners, it’s the most useful chart in the release.
Source: Anthropic

Source: Anthropic

Mythos 5 succeeds on the Firefox vulnerability evaluation 88.4% of the time, and hits 83.8% on CyberGym. Fable 5 scores 0.0 across the board.

The mechanism behind that zero is the safeguards and they matter for how you'll use the model day to day. The safeguards aren't refusals, they're classifiers sitting in front of the model. When one trips, your query gets answered by Opus 4.8 instead, and you're told it happened. On the API you configure this through a new Fallback API. Anthropic says fallbacks fire in under 5% of sessions, which sounds reassuring until you remember that security teams are not a representative sample of sessions.

The early signal we’re seeing is that the routing is broad. A SANS researcher reported that ordinary defensive work — incident response, detection engineering, basic forensics — got bounced to Opus 4.8 in his initial testing. Anthropic's own framing is that the classifiers are intentionally conservative while they tune. Put plainly: the classifier doesn't know you're a defender, it only sees a security task.

So will Fable 5 find vulnerabilities if you point it at your codebase? Sort of. The capability is there, but deliberate vuln-hunting is exactly the workload the classifiers exist to catch, so what you'll actually get on that task is Opus 4.8's answer. That’s not nothing; last-generation models find real bugs, just at last-generation rates. Incidental findings during normal code review probably won’t trip the classifiers, which means that the work around the vulnerability — threat modeling, architecture review, detection logic, investigation support — can, in theory, run at full Fable capability.

If a peer asked me how to roll this out, I'd say: use Fable 5 where the reasoning depth pays off and the security classifier won't fight you. Send explicitly security-domain work to Opus 4.8 on purpose rather than waiting to get bounced mid-task. And instrument your fallback rate, because that tells you which of your workflows live behind the gate and it becomes your business case for trusted access later.

Full Mythos 5 Capabilities Should Continue Expanding to Defenders

Glasswing has already made the case for this. In the first weeks of the program, Anthropic and roughly 50 partners found more than ten thousand high- or critical-severity vulnerabilities in systemically important software. Cloudflare reported around 2,000 bugs. Mozilla found and fixed 271 in a single Firefox release — over ten times what the previous model caught. Last week the program grew to about 150 organizations across more than 15 countries, still centered on government and critical infrastructure.

It’s worth noting that the release doesn’t commit to much beyond making Mythos 5 available to existing Glasswing participants. Biology researchers come later and Anthropic intends to expand access through a broader trusted access program for vetted cybersecurity organizations. An intention, but no timeline, and no eligibility criteria. I'm noting that not to be cynical but because it's the part of the launch I most want to see them follow through on.

The current definition of who counts as a defender is too narrow. Government and critical infrastructure was the right place to start. But enterprise security teams protecting platforms that hold massive amounts of business data, SaaS providers whose compromise would cascade through thousands of customers, the open source maintainers underneath all of it — these are legitimate defenders too, and attackers are not waiting for a trusted access program. Some version of this capability leaks, gets replicated by a less careful lab, or gets rebuilt by a well-resourced adversary and defenders need to be prepared.

The retention condition

Mythos-class access comes with a condition that's getting a lot of attention: mandatory 30-day data retention on all traffic, including for customers with existing zero-data-retention agreements. Anthropic says the data isn't used for training; it exists to catch novel attacks and jailbreaks that don't show up in a single exchange.

Every company adopting these models has to make a deliberate call here. The questions I'd work through: Is this a one-off term, or the standard safety architecture for frontier models from here on? My read is the latter, which changes the decision — you're not negotiating one model provider’s clause, you're deciding whether your data governance can live with where the industry is going. Does the no-training commitment cover the exposure you actually care about? Which of your data classifications were predicated on zero retention, and does 30-day safety retention break any of them? And is the monitoring rationale credible — can a vendor realistically defend a model this capable without cross-session visibility? I don't think they can, which doesn't make the tradeoff free, just real.

Whatever you decide, decide it on purpose. Document the risk acceptance, review it against your classifications, reflect it in your sanctioned-tool guidance. The risk is backing into acceptance because your engineers wanted the model by Wednesday.

The question we should really be asking

When trusted access opens up, security teams that aren’t ready will struggle with absorption.

Picture day one: a Mythos-class model starts surfacing real vulnerabilities in your environment at an 80%+ hit rate. Who triages that? At what volume? Through what remediation pipeline? If the answer is your legacy vulnerability management process, you're going to drown. Finding vulnerabilities was never the bottleneck. The bottleneck has always ben remediation and the Glasswing numbers say discovery just got dramatically cheaper while remediation still runs on human time.

This is why we've invested in machine-speed attack patch mitigation and vulnerability remediation. The bet was always that discovery would outpace human-speed patching, and that bet is getting a lot more concrete. A frontier discovery engine feeding a human-speed remediation process doesn't reduce your risk on its own; it gives you a bigger backlog and better-documented exposure.

The preparation is unglamorous and none of it requires waiting for access: remediation automation, triage that can handle machine-scale finding volume, clear ownership for what the model surfaces, real metrics on fix velocity. All of it determines whether access is worth anything when you get it.

Where I land

This is a well-executed release of a powerful capability class. Gate the highest risk capabilities, expand deliberately, instrument everything. Glasswing proved the model and the eval data says the gates work. My concerns aren't about the release; they're about how unprepared most security programs are for what comes after it. The defenders who come out ahead won't be the ones who get Mythos access first. They'll be the ones whose programs were already built to act on what these models find.

That work can start now.

Newsletter CTA