Claude Code Source Leak: What the Exposed Code Reveals About AI Agent Architecture

Wednesday 01/04/2026

·9 min read

On March 31, 2026, Anthropic accidentally shipped .map source map files inside the Claude Code npm package. Within hours, the full TypeScript source code was readable by anyone who installed it. The leak exposed the complete architecture behind one of the most capable AI coding agents available today.

This isn't about drama. The leaked code is a masterclass in production AI agent design, and there are concrete patterns every developer building agents should understand.

How the leak happened

Source maps are debug files that map minified/bundled code back to original source. They're standard in development but should never ship in production packages. A build configuration issue - reportedly related to Bun's bundler defaults - included .map files in the published npm tarball.

Anyone who ran npm pack @anthropic-ai/claude-code could extract and read the full, un-minified source.

Anthropic pulled the package and republished within hours, but mirrors and forks had already captured everything.

The system prompt is enormous - and carefully layered

The first thing that stands out is the system prompt construction. It's not a single string. It's built by a SystemPromptBuilder that assembles sections dynamically:

Identity and role - "You are an interactive agent that helps users with software engineering tasks"
System rules - tool execution model, permission awareness, prompt injection warnings, context compression behavior
Task execution guidelines - read before modifying, avoid speculative abstractions, no unnecessary files, diagnose failures instead of retrying blindly
Action safety - reversibility and blast radius awareness for destructive operations
Environment context - model family, working directory, date, platform, git status snapshot
Project context - discovered CLAUDE.md instruction files with a truncation budget (4K per file, 12K total)
Runtime configuration - loaded settings and user preferences

The dynamic boundary marker __SYSTEM_PROMPT_DYNAMIC_BOUNDARY__ separates static instructions from per-session context. This matters for prompt caching - everything above the boundary can be cached across sessions, saving significant API costs.

[Static instructions - cacheable]
__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__
[Environment + project context - per-session]

This is a real architectural decision driven by economics. With Opus-class models costing 15/75 dollars per million input/output tokens, caching the static portion of a multi-thousand-token system prompt saves real money at scale.

The tool system: 19 tools with a permission model

Claude Code exposes 19 tools to the model, each with an assigned permission level:

| Permission Level | Tools | |-----------------|-------| | ReadOnly | read_file, glob_search, grep_search, WebFetch, WebSearch, Skill, ToolSearch, Sleep | | WorkspaceWrite | write_file, edit_file, TodoWrite, NotebookEdit, Config | | DangerFullAccess | bash, Agent, REPL, PowerShell |

Five permission modes control what's allowed: ReadOnly, WorkspaceWrite, DangerFullAccess, Prompt (always ask), and Allow (always permit). The check is a simple comparison - the active mode must be >= the tool's required level.

What's interesting is that bash requires full access while grep_search doesn't. This reflects the real security boundary: reading files is safe, but arbitrary shell execution can do anything.

The Agent tool - which spawns sub-agents - also requires DangerFullAccess. A sub-agent inherits the parent's conversation context but runs as a separate API call. This is how Claude Code parallelizes independent tasks.

The agentic loop is simpler than you'd think

The core conversation loop follows a pattern that every agent framework uses, but the implementation is surprisingly clean:

while true:
    response = call_model(messages)

    if no tool_calls in response:
        break

    for each tool_call in response:
        check_permission(tool_call)
        run_pre_hook(tool_call)
        result = execute_tool(tool_call)
        run_post_hook(tool_call, result)
        append_result_to_messages(result)

The max iteration limit is effectively unlimited (usize::MAX in the Rust reimplementation). Claude Code trusts the model to terminate. In practice, it does - the system prompt explicitly instructs the model to avoid retrying failed operations in loops.

The real complexity is in the edges: what happens when a tool needs user permission, when a hook denies execution, when the context window fills up.

Auto-compaction: how it handles long sessions

When cumulative input tokens exceed 200K (configurable via CLAUDE_CODE_AUTO_COMPACT_INPUT_TOKENS), the compaction system kicks in:

Preserve the N most recent messages (default: 4)
Summarize everything older, extracting:
- Tool names mentioned
- Recent user requests
- Pending work items (detected by keywords like "todo", "next", "pending")
- Key files referenced (detected by path patterns)
- A compressed timeline
Replace the old messages with a single system message: "This session is being continued from a previous conversation"

This is how Claude Code handles sessions that run for hundreds of turns without hitting context limits. The model sees a summary of what happened plus the recent conversation, which is usually enough to maintain coherence.

The article that first reported the leak also claimed a bug in this system was causing around 250K wasted API calls per day across all Claude Code users - consecutive compaction failures weren't being capped, leading to retry loops.

The hook system: extensibility without plugins

Hooks are shell commands that run before or after tool execution:

PreToolUse hooks: run before execution. Exit code 0 = allow, exit code 2 = deny, anything else = warn
PostToolUse hooks: run after execution with the same exit code semantics

Hook commands receive context via stdin as JSON and through environment variables (HOOK_EVENT, HOOK_TOOL_NAME, HOOK_TOOL_INPUT). Their stdout becomes feedback merged into the tool result.

This is a clever design. Instead of building a plugin API with versioning headaches, hooks are just shell scripts. They can be written in any language, they're easy to debug, and they compose naturally with existing tooling.

CLAUDE.md discovery walks the entire directory tree

The instruction file discovery is more thorough than most people realize. Starting from the current working directory, it walks every ancestor directory up to the filesystem root, checking for:

CLAUDE.md
CLAUDE.local.md
.claude/CLAUDE.md
.claude/instructions.md

Files are deduplicated by content hash and loaded with a total budget of 12K tokens (4K per file). This means you can have project-level instructions, monorepo-level instructions, and user-level instructions that all compose together.

Sandbox: namespace isolation on Linux, fallback on macOS

The sandbox system has three modes: Off, WorkspaceOnly (default), and AllowList.

On Linux, it uses unshare to create actual namespace isolation - user, mount, IPC, PID, and UTS namespaces - with optional network isolation. This is real container-like isolation without needing Docker.

On macOS, it falls back to overriding environment variables (HOME, TMPDIR) to point at sandbox directories. This is weaker isolation but better than nothing.

Container detection checks for /.dockerenv, /run/.containerenv, various environment variables, and /proc/1/cgroup contents - so the sandbox adapts when Claude Code is already running inside a container.

Anti-distillation: poisoning the training data

One of the more surprising findings: a flag called ANTI_DISTILLATION_CC that injects decoy tool definitions into the system prompt. The purpose is to poison the training data of anyone recording Claude Code's API traffic to train competing models.

If you're intercepting API calls to distill Claude's behavior into a smaller model, these fake tools would corrupt your training data with tool definitions that don't actually exist. It's a creative defense against a real threat - multiple companies have been caught training on API traffic from commercial models.

What this teaches us about building agents

The leaked architecture validates several patterns that the open-source agent community has been converging on:

1. System prompts should be constructed, not written. A builder pattern that assembles context from multiple sources (static instructions, environment, project config, user preferences) produces better results than a monolithic prompt string.

2. The tool permission model matters. Not every tool should be equally accessible. Read operations are fundamentally different from write operations, and shell execution is in a category of its own.

3. Auto-compaction is essential for long sessions. You can't rely on ever-growing context windows. A summarization strategy that preserves recent context while compressing history is the practical solution.

4. Hooks beat plugins. Shell-based extensibility is simpler, more debuggable, and more composable than a formal plugin API. The tradeoff is less structure, but for a CLI tool that's usually the right call.

5. Prompt caching economics drive architecture. The static/dynamic split in the system prompt isn't an aesthetic choice - it's an optimization that saves real money. When you're paying per token, cache boundaries become architectural decisions.

The open-source response

Within hours of the leak, open-source reimplementations appeared. The most notable is Claw Code by Sigrid Jin, which hit 50K GitHub stars within two hours. It includes both a Python metadata layer that catalogs the original TypeScript structure and a functional Rust port that can actually interact with the Anthropic API.

The Rust implementation covers the core loop, all 19 tools, the permission model, hooks, sandbox isolation, OAuth, session compaction, and the full CLI with REPL mode. It's a clean-room rewrite - not a copy-paste - and it demonstrates that the architecture is sound enough to reimplement from the design alone.

The real takeaway

The leaked code isn't revolutionary. There's no secret sauce. What makes Claude Code effective is the combination of a capable model with careful engineering: thoughtful prompt construction, a well-designed permission model, robust session management, and pragmatic extensibility.

That's actually the most useful lesson. If you're building an AI agent, you don't need exotic architectures. You need solid engineering applied to the fundamentals: how the model sees context, how tools are permissioned and executed, how long sessions are managed, and how users can extend behavior.

The code that leaked is good engineering. It's not magic.

FAQ

Is the leaked Claude Code source still available?

The original npm package was pulled and republished without source maps within hours. However, mirrors, forks, and open-source reimplementations captured the architecture before removal.

Does the leak affect Claude Code users' security?

No. The leak exposed Anthropic's own source code, not user data. Your API keys, conversation history, and local files were never at risk.

What is the CLAUDE.md file and how does Claude Code use it?

CLAUDE.md is a project instruction file that Claude Code discovers automatically. It walks your directory tree from CWD to root, loading CLAUDE.md files at each level with a 12K token budget total. This is how you give Claude Code project-specific context and rules.

How does Claude Code handle long conversations without running out of context?

It uses auto-compaction. When input tokens exceed 200K, older messages are summarized into a compressed system message while the 4 most recent messages are preserved verbatim. This lets sessions run for hundreds of turns.

What programming language is Claude Code written in?

The original Claude Code is written in TypeScript, bundled with Bun. The most popular open-source reimplementation (Claw Code) uses Rust for the functional implementation and Python for a metadata/cataloging layer.