Archive for Anthropic

Anthropic Restores Claude Fable 5 After U.S. Lifts Jailbreak-Linked Export Controls

Posted in Commentary with tags on July 1, 2026 by itnerd

Anthropic is putting Claude Fable 5 back online worldwide. On June 30, the U.S. Commerce Department lifted the export controls it had imposed on Fable and its more tightly controlled sibling Mythos 5 about two and a half weeks earlier.

Fable 5 returns to users on Wednesday, July 1, across Claude.ai, the Claude Platform, Claude Code, and Claude Cowork.

Export controls restrict who can receive or use a technology. The June 12 order told Anthropic to cut off both models for any foreign national, inside or outside the United States, including its own non-citizen staff.

Commenting on this is Mayur Upadhyaya, CEO at APIContex

“The restoration of Claude Fable 5 is welcome news. However, many organizations discovered they had unintentionally created a single point of failure in their AI strategy.

 Where workflows had automatically adopted the latest Anthropic model, removing Fable didn’t always result in graceful degradation. In some cases, automations failed silently because there was no fallback, no cold restart, and no operational awareness that the dependency had changed. This isn’t a criticism of Anthropic. The pace of innovation from the foundation model providers is extraordinary. But it does highlight that enterprises are beginning to treat AI models as operational infrastructure rather than productivity tools. Every infrastructure dependency eventually changes. Models are updated, withdrawn, restricted, or superseded. The question for organizations is no longer whether they’ll use frontier AI. It’s whether their workflows continue to operate when those underlying dependencies inevitably change. As AI becomes part of critical business processes, resilience needs to evolve beyond model performance. We need to verify that the transactions built on top of these models continue to perform and conform, even when the infrastructure beneath them changes.”

I’m going to go out on a limb and suggest that the US didn’t really have much of a choice but to release Claude Fable 5. But that doesn’t me that anything that AI generates doesn’t need to be validated. In fact, I would argue that it doubles the need for validation.

Claude Reports Major Outage Across Multiple Models 

Posted in Commentary with tags on June 23, 2026 by itnerd

You may have noticed that Claude AI has had an outage today. 9to5Google reports the following:

Anthropic says it is aware of an outage and has rolled out a fix as recent as 10:53 a.m. ET. The company’s server status website indicates an issue affecting multiple models occurred at 10:19 a.m. ET.

The status update doesn’t detail which models were affected, though attempts to get a response from Sonnet and Opus returned nothing. Those models seem to be the most commonly used, especially as Fable 5 was recently pulled from user access.

The current outage did, however, affect those models across all platforms except for Claude for Government. That includes claude.ai, Claude Console, Claude Code, and Claude API. The total outage time comes close to an hour and stands out as one of the largest outages to hit Anthropic within the past 60 days.

Commenting on this news is Jamie Beckland, Chief Product Officer at APIContext

“Ready or not, AI inference is now production infrastructure. Enterprises are no longer using these systems only for experiments or side projects. They are putting AI into customer support, coding workflows, analytics, operations and decision support. When an inference endpoint slows down, throws errors or goes unavailable, that can now break a real business process.

Enterprises must run AI with the same discipline they apply to payments, cloud, APIs and other critical services. That means continuously monitoring inference endpoints for latency, error rates, model availability, response quality and regional performance. It also means having a tested failover plan before the outage happens.

Applications with one model provider hardcoded create a single point of failure. A more resilient approach is to design AI systems with fallback models, backup providers, graceful degradation and clear routing rules. Not every task needs the same model. If the primary model is unavailable, some workloads can move to another frontier model, some can fall back to a smaller model, and some should pause rather than return a bad answer.

Six months ago, these tools were enterprise experiments. Now, AI resilience is part of operational resilience.”

If you rely on AI as part of your business, then you need to plan for downtime. Why? Downtime is part of the game and you need to be prepared for it or bad things will happen.

Anthropic’s Claude Fable 5 Pulled From The Market

Posted in Commentary with tags on June 16, 2026 by itnerd

Something that I missed last week is the fact that Anthropic who has had a testy relationship with the government has released Claude Fable has been released and then pulled shortly after release:

The AI lab said in a statement that the federal government told it Friday afternoon that it had become aware of a way of “jailbreaking” Fable 5, bypassing limits that Anthropic had implemented to reduce the risk the model could be misused. When Anthropic first announced Mythos, it released the software to only a select group of government agencies and technology professionals because of its ability to uncover cybersecurity vulnerabilities. 

The government imposed what are known as export controls on the products, which Anthropic said means it had to suspend access to the two models by any foreign national, whether inside or outside of the US. The only way it could do so is by shutting the models down entirely, the company said.

So what is Claude Fable 5. I will let the company itself explain:

Claude Fable 5 is a Mythos-level model built for your most ambitious, long-running projects. Try problems you weren’t able to solve with other models. Claude Fable 5 is thorough, proactive, and tests its own work.

Scary stuff. Chris Nyhuis, CEO of the cybersecurity company Vigilant had this comment including with the fact that Amazon was behind this:

A jailbreak is when someone gets an AI model to step around the safety limits its maker built in. In our work that matters because the same capability that lets a model find and fix a vulnerability in a client’s code is the capability that can hand an attacker a roadmap. It’s dual-use, like most powerful tools

Did a “jailbreak” even happen or did Amazon make it up? 

From my perspective it is not even clear a real jailbreak happened. What was demonstrated was a model being asked to read code and fix the flaws in it. That is not someone breaking the guardrails; that is the exact job we hire these tools to do. By the maker’s own account the vulnerabilities were minor and already findable with other models. We pulled a national defensive asset off the field over a finding that, on the public record, looks more like normal defender work than a weapon.

What are the ramifications from the White House to Wall Street to Main Street?

This was the first time a government pulled a commercial AI model off the market over a cyber capability. That sets a precedent every CISO, cloud provider, and investor now has to price in. When access to your best defensive tool can disappear in ninety minutes by directive, that is a board-level risk, not just an engineering one.

Has the White House overstepped and weakened cybersecurity nationally? 

The cybersecurity defender’s argument is straightforward. America’s adversaries are not waiting for an export license. If we slow the people defending American networks while the attackers keep moving, we have made the gap worse, not better. The honest version is that this is a genuinely hard tradeoff, and reasonable people in my field disagree on where the line sits.

How do we know what to trust from AI and if cybersecurity can protect us from hackers jailbreaking? 

Tools come and go, but the harder problem is the people. In the cyber world we hand a small number of people the keys to everything: the networks, the source code, the detection systems. As a nation we have to be far better at making sure the people in those seats are vetted, trusted, and genuinely on our side. That is not about where someone was born. It is about whether we have done the work to earn confidence that the person holding the keys is aligned with the mission. Right now we lean too hard on the technology and not nearly hard enough on the trust model around the people who run it.

Anthropic’s Fable 5 release signals a new approach to AI safety

Posted in Commentary with tags on June 10, 2026 by itnerd

Anthropic’s release of Claude Fable 5 highlights a significant shift in how advanced AI systems are being deployed. Rather than limiting capability, the company is separating access and safety controls from the underlying model itself, making powerful AI available for general use while restricting higher-risk applications through additional safeguards and controlled access programs. The approach reflects a broader challenge facing the industry: how to balance increasingly capable AI systems with the governance, oversight, and usage controls needed to prevent misuse in sensitive areas such as cybersecurity.

Gidi Cohen, CEO & Co-founder, Bonfy.AI

“The most honest thing Anthropic has done here is ship one model as two products. Splitting Fable 5 and Mythos 5 is an acknowledgment that capability and safety are in genuine tension — and that pretending otherwise doesn’t serve anyone.

But the most important line in the entire announcement isn’t about the classifiers. It’s buried in the operational detail: a high-severity vulnerability found by the model takes about two weeks to patch on average. Meanwhile, Mythos Preview built working exploits from a disclosed CVE in under a day.

That gap is where risk lives. And no classifier closes it.

This makes concrete what the CSA data showed last week: enterprises aren’t failing because they can’t detect vulnerabilities. They’re failing because they can’t act on them fast enough. AI has collapsed the attacker’s timeline to hours. The defender’s timeline hasn’t moved.

Anthropic is right that the defensive head start only matters if the industry uses it. The harder truth is that most enterprises aren’t yet equipped to — not because the tools don’t exist, but because the governance architecture to deploy them safely hasn’t kept pace with the capability.

That’s the real race.”

Yagub Rahimov, CEO, Polygraf AI

“By splitting one model into two products, separated by a safety layer rather than by capability is a genius marketing and gtm strategy. With this approach Anthropic admits publicly that LLMs have dangerous capabilities, and frankly speaking every enterprise should therefor question who governs access to these LLMs. Every enterprise leader should have this sort of honesty as a base standard.

This admittance about AI risk also changes the conversation. Imagine that within just days of its launch a single model autonomously finds vulnerabilities that survived 27 years of every human review in a major operating system. The strategic question we should ask is no longer how powerful that model is. It is who controls the behavioral layer between the model and the mission. America has been leading the world in building frontier AI. Now, our next obligation is to lead in governing and securing how that AI behaves once it touches enterprise and government data. Capability won the first race. Governance and security wins the second.”

Organizations need to keep pace with security and the like so that releases such as Claud Fable 5 don’t overwhelm them. If they don’t, then you can expect that organizations will lose this battle.

UPDATE: I have additional commentary starting with Ryan McCurdy, VP of Marketing, Liquibase:

   “Anthropic’s release shows the industry is starting to separate model safety from deployment safety. That is the right conversation. A more capable coding model can be safer at the model layer and still create risk once it is connected to repositories, pipelines, cloud environments, and databases.

   “The enterprise question is not just whether the model has safeguards. It is whether the organization can prove control over the work the model produces. Who approved the change? What systems did it touch? Did it follow policy? Can it be traced and reversed if it breaks production? As models get better at long-running software tasks, governance has to move closer to the actual change, especially in the systems where code, data, and compliance meet.”

Jacob Krell, Senior Director: Secure AI Solutions & Cybersecurity, Suzu Labs:

   “Anthropic filed for its IPO on June 1 and launched Fable 5 eight days later at double the Opus token rate. The benchmark gains are real but concentrated in frontier-hard tasks. SWE-bench Pro jumps 11 points, from 69.2% to 80.3%. On routine work the gap shrinks to near-parity, and cost-per-solve still favors Opus 4.8 at $1.45 vs $2.49 per solved task.

   “The token economics compound the pricing. Fable 5 burns tokens at twice the Opus rate. A BleepingComputer reviewer exhausted a $100 daily allocation in nine minutes running Anthropic’s workflow mode. At $10/$50 per million tokens, heavy agentic work can clear three figures a day.

   “I do complex offensive cybersecurity tasks on Opus 4.6. No cybersecurity classifier. No mandatory data retention. Fable 5 charges double, blocks those queries, and redirects them to Opus 4.8.

   “Anthropic needs to show public-market investors it can monetize a $965 billion valuation. Fable 5 doubles per-token revenue. The cybersecurity gains are locked behind Project Glasswing.

   “Everyone else pays double and gets Opus 4.8 responses on security queries.”

Noelle Murata, Chief Operating Officer at Xcape, Inc.

   “Anthropic’s broad commercial release of Claude Fable 5 represents a calculated pivot in the frontier AI landscape: attempting to monetize elite, long-horizon reasoning architecture while strictly walling off its most “hazardous” capabilities. By implementing an aggressive, real-time classifier system that automatically downgrades high-risk cybersecurity, biochemical, or model-distillation requests to the less powerful Claude Opus 4.8 framework, Anthropic is trying to fulfill its commercial obligations without turning a public LLM into an on-demand zero-day factory.

   “However, this bifurcated release strategy highlights a growing divergence in enterprise defense. While everyday enterprise customers gain access to Fable 5’s highly advanced software engineering and long-running autonomous logic, Claude Mythos 5 remains exclusively accessible to a tight cohort of government intelligence agencies and select critical infrastructure defenders under Project Glasswing. This means the actual “cybersecurity tier” of this technology remains behind sovereign closed doors, leaving commercial security teams to defend against an increasingly automated threat landscape without the same unrestricted analytical tools being deployed by nation-state actors.

   “Critical Takeaways

  •    “The Fallback Safety Loop: Fable 5 relies on active routing classifiers; roughly 5% of user prompts trigger a silent safety downgrade to Opus 4.8, creating an intentional, built-in performance ceiling on sensitive technical domains.
  •    “The Defensive Technology Asymmetry: By maintaining a fully un-guardrailed “Mythos 5” tier strictly for government and certified infrastructure partners, the gap between state-level cyber capabilities and commercial enterprise defense tools is widening.
  •    “Commercially Prohibitive Intelligence: At $10 per million input and $50 per million output tokens, Fable 5 is priced as a premium, specialized tool—making it twice as expensive as Opus 4.8 and reinforcing that frontier-level autonomous reasoning remains a luxury tier for enterprise workflows.

   “Anthropic built a brilliant system to prevent script kiddies from generating bioweapons, but blocking offensive cyber requests simply ensures that the good guys are the only ones playing with handcuffs on.”

John Strand, Owner, Black Hills Information Security, Inc.:

   “We need to remember that Mythos is not the end state. Mythos is a harbinger of what’s coming next. Too many people look at these demonstrations and assume they’re seeing the finished product. They’re not. They’re seeing the beginning.

   “Every major AI vendor on the planet is investing heavily in capabilities that will eventually compete in this space. At the same time, open-source models continue to improve at an astonishing pace. It won’t be long before anyone can download a model from an open-source repository, run it locally, and achieve exploit development, vulnerability research, and attack-path analysis capabilities that rival or exceed what we’re seeing from the most advanced systems today.

   “The real lesson isn’t that Mythos exists. The real lesson is that these capabilities are becoming democratized. What is currently available to a handful of well-funded organizations today will eventually be available to everyone. The barriers to sophisticated vulnerability discovery, exploit development, and attack-path chaining are falling rapidly, and defenders need to start planning for a world where advanced offensive capabilities are widely accessible.”

Sunil Gottumukkala, CEO, Averlon:

   “Fable 5 represents a meaningful shift in what’s possible for code generation at scale. Models at this capability level can compress months of engineering work into days, which changes the economics of vulnerability exposure and remediation significantly.

   “That makes it even more important for organizations to understand their attack surface, know which vulnerabilities are actually exploitable in their environment, what they connect to, and which ones warrant that fix-generation capacity in the first place. The most effective approach evaluates risk as changes are introduced, not after they’ve already reached production.

   “As the dual forces of code generation and exploit generation become faster and cheaper, the triage layer becomes the critical bottleneck to ensure the right risks are prioritized and fixes are in place before a breach.”

Anthropic’s Glasswing rollout is a good start — but access isn’t the same as ongoing security 

Posted in Commentary with tags on June 2, 2026 by itnerd

Anthropic is expanding access to its most advanced frontier model, Mythos, to roughly 200 organizations through Project Glasswing.

Through the expansion, access to Claude Mythos Preview — Anthropic’s model for identifying software vulnerabilities in codebases — will be granted to around 150 additional organizations, all of which must clear security requirements before joining. Participating organizations now span more than 15 countries, with Anthropic signaling plans to broaden that geographic footprint going forward.

Justin Beals, CEO & Founder, Strike Graph, an AI-native GRC and compliance management platform:

“Controlled rollout of frontier AI is the right instinct. But opacity is not a security strategy. Anthropic has published some metrics, and that’s a start, but the validation methodology is self-selected. They chose which findings to send for independent review, and the reviewers were contractors they hired. The broader security community needs access to independent, third-party evaluation across the full corpus. As these tools become more capable, the organizations cleared to use them become high-value targets. Access without continuous compliance validation is just a slower version of the same risk. Whoever gets access, the standard should be verifiable transparency, not curated receipts.”

I for one am cautiously optimistic. But I have see more in terms of controls coming from Anthropic before I feel 100% comfortable.

Anthropic quietly patches Claude Code sandbox issue

Posted in Commentary with tags on May 20, 2026 by itnerd

Anthropic quietly patched a sandbox bypass vulnerability in Claude Code without public disclosure, leaving developers and security teams unaware that the agentic coding tool they were running had a containment flaw. The silent fix reflects a broader pattern: as AI coding agents are rapidly adopted into developer workflows, the security posture of those tools is often opaque — even to the vendors shipping them.

SecurityWeek has coverage here: Anthropic Silently Patches Claude Code Sandbox Bypass – SecurityWeek

Gidi Cohen, CEO & Co-founder, Bonfy.AI had this comment:

“The technical details here are worth understanding — a null-byte injection that tricks an allowlist filter into approving connections it should block, chainable with prompt injection to exfiltrate credentials and tokens. Anthropic fixed it. The researcher is frustrated about disclosure process. That debate will continue.

But the more important signal is structural: sandbox boundaries are policy enforcement mechanisms, and policy enforcement is only as good as the data flowing through it. When the filter sees .google.com and approves, it’s not making a security mistake — it’s doing exactly what it was told. The problem is that the data it was evaluating had already been manipulated upstream.

This is the pattern that keeps recurring across AI agent security incidents. The attack doesn’t defeat the control directly. It shapes the input so the control defeats itself. Prompt injection, malicious comments, null-byte tricks — these work because inspection is happening at the wrong layer, or not at all, and because the data moving through these systems isn’t being evaluated for what it actually contains.

Organizations deploying AI coding agents today should be asking a harder question than “is our sandbox configured correctly?” The question is whether they have any visibility into the data those agents are touching, generating, and sending — before it reaches any boundary at all.

Configuration is a starting point. It was never a substitute for understanding the data.”

I really hope that this doesn’t become a trend as it would really make me less likely to trust AI based developer tools. But I guess we will see on that front.

Anthropic restricts release of new AI model after it identifies hundreds of zero-day vulnerabilities

Posted in Commentary with tags on April 9, 2026 by itnerd

Anthropic has unveiled a new AI model, Claude Mythos Preview, capable of identifying hundreds of previously unknown high-severity vulnerabilities, including more than 500 zero-day flaws in open-source software during testing. The model demonstrated the ability to autonomously analyze codebases and surface security weaknesses at scale, significantly accelerating vulnerability discovery.

Testing also showed the model could identify vulnerabilities across major operating systems, web browsers, and widely used software, with some findings involving long-standing flaws that had gone undetected for years.

Due to these capabilities, Anthropic has restricted access to 40 technology companies, including Apple, Amazon and Microsoft, under its “Project Glasswing” initiative rather than releasing the model publicly. The limited group of organizations will use the model to find and patch security vulnerabilities in critical software programs.

Anthropic said the controlled rollout is intended to evaluate both defensive and offensive implications of AI-driven vulnerability discovery, while working with the select partners to manage risks associated with misuse of the technology.

   “The goal is both to raise awareness and to give good actors a head start on the process of securing open-source and private infrastructure and code,” Jared Kaplan, Anthropic’s chief science officer said.

Nick Mo, CEO & Co-founder, Ridge Security Technology Inc.:

   “You can also look at this from another angle: try using Claude to write some code and see how many bugs, or even new zero-days, it produces. Claude Code is already making developers many times more productive than before, which means the number of potential vulnerabilities being introduced is also many times greater. It’s writing code and writing vulnerabilities at the same time. No wonder they’re rushing to get security companies involved first. Digging holes and filling them simultaneously, the question is just which side is faster.”

Noelle Murata, Sr. Security Engineer, Xcape, Inc.:

   “Anthropic’s Claude Mythos Preview has effectively industrialized zero-day discovery, identifying over 500 high-severity vulnerabilities in core open-source software that escaped decades of human and automated scrutiny. These findings include a 27-year-old remote crash bug in OpenBSD and a 16-year-old flaw in FFmpeg, surfaced by a “hypothesize-and-verify” loop that autonomously confirms exploits before reporting them.

   “To manage this massive “vulnerability debt,” Anthropic launched Project Glasswing, a restricted partnership with 40 tech giants like Microsoft and Apple to coordinate global patching. By pledging $100 million in compute credits to open-source maintainers, the initiative aims to bridge the gap between AI-driven discovery and the human speed of remediation, ensuring that the “Glasswing 40” don’t become the only secure entities on an otherwise broken Internet.

   “If Project Glasswing is a “cyber-nuke,” Anthropic is attempting to ensure the “mutually assured destruction” of bugs happens in a controlled vacuum before it hits the production Internet.”

Steven Swift, Managing Director, Suzu Labs:

   “Anthropic has a reputation for exaggerating the capabilities of their models, especially around their ability to find novel vulnerabilities. For example, their models have struggled with line(s) of code that could be vulnerable, but only if you ignored the preceding lines of code, that properly handled the risk and left no residual vulnerability.

   “Looking at what they’ve published so far in their Mythos Preview, they’re again making big claims. Particularly of note, is that the community is not being given access to the model at this time. That means it isn’t possible to audit big claims, and we’re left with Anthropic asking us to trust them, despite having established a pattern of misrepresentation and exaggeration on many of their other publications.

   “Let’s take a closer look at what they’re claiming, and what they’re willing to provide details on. The claim is that Mythos can find and fix novel vulnerabilities in secure code bases, that have been competently hardened via legacy tooling and review processes. To provide evidence of this capability they describe the finding vulnerabilities in the following software packages: OpenBSD, FFMPEG codec H.264, an undisclosed VMM, and “several thousand more.”

   “They estimate they spent $20,000 to find the OpenBSD bug, though they said that was the total run, which found other bugs as well.

   “Great, we have two specific vulnerabilities that they’ve specifically chosen to highlight.

   “They accurately highlight the difference between vulnerability – a POTENTIAL weakness. And an exploit, a functioning piece of code that takes advantage of one or more vulnerabilities.

   “We then move on to exploit development, which is COMPLETELY different than discovering vulnerabilities. Exploits are just code. If you provide any major LLM a sufficient detail of how an exploit works, it should be able to generate a functioning exploit. This is not new. It however relies on two things 1) sufficient detail for the exploit 2) sufficient detail for the system that is being exploited.

   “They describe writing an exploit for FreeBSD which did not require human-in-the-loop interactions. However, they point out that Opus was also able to exploit the same vulnerability, though it did require such human input.

   “Additionally, when looking at the Linux kernel, they admit that they were not able to create functioning exploits with the “vulnerabilities” that were discovered.

   “They also go into great detail about a kernel exploit that Claude wrote. But for this exploit to be possible, they had to provide it PREVIOUSLY DISCOVERED context from a fuzzer. That is again, very much NOT Mythos discovering and exploiting a vulnerability. But merely demonstrating that if you provide sufficient context, these models can write code. This is the capability that they chose to highlight with the longest and most detailed technical breakdown. And while the exploit that was eventually developed is claimed to elevate privileges to root, it needs to be emphasized again here. Mythos did not “discover” this vulnerability. It merely wrote some code, after being provided sufficient technical information into its context as to what code it should write.

   “Anthropic knows what they’re doing. They’re making big claims, because attention is good for their business model. They’re providing just enough detail so that their claims look convincing at first glance. But when you look closer, claims lack substance and rely on implications that all of the examples related prove their claims. This lets the reader naturally jump to conclusions that aren’t explicitly stated, but are easy to make. And they bury this under a lengthy, fairly technical document. Making it yet more challenging for readers to decipher.”

Sunil Gottumukkala, CEO, Averlon:

   “Mythos Preview signals that zero-day discovery is becoming cheaper, faster, and more scalable. Researchers have already shown earlier models can help find serious vulnerabilities, but this represents a real capability jump. Even with restricted access, the broader implication is clear: we should expect more dangerous vulnerabilities to be found across major software platforms, and many organizations still don’t patch fast enough to keep up.

   “Once a patch is released, adversaries often move quickly to reverse engineer it and build exploits. At that point, the impact extends well beyond the small group with direct access to the model, potentially increasing overall breach volume.”

Joshua Marpet, Senior product security consultant, Finite State:

   “Anthropic limiting Mythos access to top defenders via Project Glasswing is a fantastic first step, but it needs to be codified and expanded. Expect a new model to completely break the security landscape every six to twelve months.

   “The speed of this evolution is staggering. Three years ago, LLMs barely wrote functional code. Today, they’re autonomously surfacing zero-days at scale. Tomorrow, they’ll be pointed directly at compiled binaries and firmware, exploiting the products we actually ship, not just source repositories. What does this look like five years from now?

   “Future breakthroughs won’t always come with responsible disclosure. The next leap in offensive AI will easily emerge from adversaries with zero intention of giving us a “head start.”

Security teams are already drowning. When adversaries start using autonomous agents to uncover zero-days, manual triage will completely break. We must shift immediately to defensive systems that cut through the noise and automatically prioritize real, reachable exposure.

   “We have to think beyond corporate consortia. We need a completely new wing of the intelligence community, agencies where humans and autonomous AI agents work side-by-side to acquire, analyze, and counter advanced adversary models.

   “The offensive landscape just went autonomous. We can no longer fight machine-speed threats with manual, point-in-time reviews. Defense must become as continuous and autonomous as the attacks coming our way.”

Bad guys are going to use this technique to pwn you. Thus you really need to put the time and effort into making sure that everything that you use is as secure as possible. And then you need to keep going back and reconfirming that you are still secure because the bad guys are going to do the same thing.

Anthropic scrambles to contain leak of proprietary Claude AI agent code

Posted in Commentary with tags on April 2, 2026 by itnerd

Anthropic is working to contain the fallout after accidentally exposing internal source code for its Claude AI coding agent, following a human error during a software update that made proprietary files publicly accessible, which was quickly discovered by a security researcher named Chaofan Shou and posted to X.

The new version of its Claude Code software package unintentionally included a file that exposed nearly 2,000 source code files and more than 512,000 lines of code including tools, techniques, and internal instructions used to guide the behavior of its AI agent. This included operational components of the system and internal frameworks used to control how the AI performs tasks.

Anthropic issued thousands of takedown requests to remove the code from public repositories.

Anthropic said it is implementing changes to prevent similar issues while continuing efforts to remove the leaked materials from circulation.

Michael Bell, Founder & CEO, Suzu Labs had this comment:

   “Anthropic shipped a 60MB source map inside their npm package. Every line of Claude Code’s source, all 512,000 of them, publicly available. For the second time. The first leak was February 2025 and the root cause was never fixed.

   “We pulled the codebase apart. The headline findings are real but the details are worse. Undercover Mode instructs Claude to disguise itself as a human developer when contributing to open source: “Do not blow your cover.” There is no force-off option. Frustration tracking runs a regex on every user input and silently sends your emotional state to Anthropic’s analytics pipeline without notification or consent. That emotional classification also feeds a system that can prompt users to share their full session transcript with Anthropic, controlled by remote feature flags that Anthropic can activate at any time.

   “The finding that matters most for government and defense: the default telemetry collects device IDs, session data, email, org UUID, and process tree information on startup before the user types anything. Environment flags can escalate collection to include full prompts, file contents, bash command output, system prompts, and entire conversation transcripts sent to commercial endpoints. The code confirms FedRAMP OAuth paths to claude.fedstart.com, meaning government deployments share the same codebase. Whether hardening was applied before those deployments is unknown, but the telemetry infrastructure is baked into the foundation. The Pentagon designated Anthropic a “supply chain risk” in March. This is what that risk looks like in code.

   “The engineers documented their own attack surfaces in comments. Prompt-injected models can exfiltrate secrets via GitHub CLI URL paths. Leaked GitHub Actions tokens enable “repo takeover” and “supply-chain pivot.” Bash parsing ambiguity allows commands to execute while hidden from security validators. They built mitigations, but the comments confirm the attack surfaces exist.

   “The AI safety company with a $380 billion IPO target acquired Bun, whose known source-map-in-production bug was filed publicly and left open while the product shipped to millions of developers. Their operational security posture is a .npmignore file that nobody checked the second time around.”

Jacob Krell, Senior Director: Secure AI Solutions & Cybersecurity, Suzu Labs had this to say:

   “The model is the engine. What Anthropic accidentally published is the machine built around it.

   “Anthropic has been here before. This is the second time Claude Code’s source has leaked through the same vector, a source map file left in the npm package. The first was in February 2025. Thirteen months later, the same packaging mistake exposed a far more complex system, days after the accidental exposure of details about an unreleased model codenamed Mythos.

   “The significance of this leak is in what the code reveals about AI agent architecture. The leak exposed approximately 512,000 lines of TypeScript across roughly 1,900 source files. Developers and researchers who have analyzed the source have since documented the scale of what Anthropic built around the model. The code contains what analysts describe as 44 feature flags for unreleased capabilities, approximately 40 permission gated tools, a multi agent coordination system, a persistent autonomous daemon mode, a layered memory architecture, defenses against competitor model distillation, and granular attribution tracking for AI versus human code contributions. The leaked code strongly suggests that the bulk of Claude Code’s production capability comes from orchestration, tooling, memory, and permission layers built around the model.

   “The multi agent coordinator mode, as documented in the leaked source, illustrates where the engineering complexity lives. The code describes a system where Claude Code operates not as a single model session but as a supervisor managing a fleet of worker agents executing tasks in parallel. In the leaked architecture, the coordinator does not directly edit files, run commands, or read code. All implementation goes through workers. Verification is handled by what the code describes as a separate adversarial agent that must confirm the output works before the task can be marked complete. In effect, this is zero trust architecture applied to AI agents, with the orchestration system enforcing verification independently of the model.

   “The leaked code also references an autonomous daemon mode, internally called KAIROS. The source describes a persistent agent that watches the developer’s project and proactively acts without waiting for user input. It uses a tick based lifecycle with periodic prompts, and the code indicates behavior that adjusts based on whether the developer’s terminal is active. The source also references memory consolidation during idle periods, converting observations into structured facts. These features represent event driven architecture, state management, and context engineering built entirely in the orchestration layer.

   “The code also contains what analysts describe as a competitive defense embedded directly in the orchestration layer. The system references injecting artificial tool definitions into certain API responses, apparently designed to degrade the performance of any competitor model trained on Claude’s outputs. That defense lives in the scaffolding. It tells you where Anthropic believes their competitive advantage sits.

   “The depth of interlocking systems documented in the leaked code is what stands out. The coordinator depends on the memory system, the memory system depends on the tool layer, the tool layer depends on the permission framework. These systems are deeply interdependent, and building them to work in concert at production quality is the hard engineering problem. The public conversation about AI capabilities focuses almost entirely on which model is smarter. What this leak suggests is that the model generates the next token, and everything around it is what turns that reasoning into reliable, operational capability.

   “This leak also serves as a proof of concept for the rest of the industry. The engineering gap between a frontier research lab and a commercial competitor appears narrower than many assumed. The architectural patterns documented in the leaked source are well structured and reproducible in principle. A competent engineering team can study the coordination strategies, memory approaches, and tool integration designs and adapt the approach using any available foundation model. The model layer is swappable. The orchestration patterns are the transferable knowledge. What Anthropic built behind closed doors is now visible, and for anyone questioning whether a smaller team could build a credible AI coding agent, the architectural proof of concept is now public.

   “The knowledge transfer effect is significant. Developers who were building AI coding tools through trial and error now have a detailed reference implementation from a team backed by billions in research and development. The architectural decisions, trade-offs, prompt engineering techniques, and multi agent coordination strategies are all visible. The effect extends beyond direct competitors. It raises the floor for every developer building with AI. The gap between what a frontier lab understood about AI agent architecture and what the broader developer community understood has been enormous. That gap collapsed overnight.

   “The model is increasingly a commodity. Multiple frontier models are available from multiple providers, and the performance gap between them continues to narrow. The orchestration system built around the model is the competitive frontier, and Anthropic just published the blueprint.”

Vishal Agarwal, CTO, Averlon adds this:

   “The deeper risk here isn’t what was exposed, it’s what becomes possible. When AI coding agent internals are public, attackers can study how those agents interpret context, follow instructions, and make decisions.

   “That makes it easier to craft inputs or artifacts that appear legitimate to developers but influence how the agent behaves: modifying code, introducing insecure changes, or interacting with downstream systems. This expands the attack surface beyond the model itself into developer workflows, CI/CD pipelines, and the systems those pipelines connect to.”

This is embarrassing for Anthropic. But I honestly am not shocked by this. They clearly need to tighten things up or this will keep happening. Which of course is bad for them.