On March 31, 2026, Anthropic accidentally shipped the Claude Code source code — a missing .npmignore entry bundled a 59.8 MB source map inside npm package v2.1.88, exposing ~512,000 lines of unobfuscated TypeScript. Within hours it was mirrored across GitHub.
Massive media coverage followed:
- 187 loading spinner verbs including “hullaballooing” and “razzmatazzing,” regex-based frustration detection triggered by “wtf” and “ffs”
- 18 terminal pets for the buddy system
- and a mysterious “Mythos” model that had leaked a week before in a separate incident.
But those were all noise.
I wanted to know – what made the Claude Code harness best-in-class? In this first post of a Claude Code Harness deep-dive, we look into its memory management system. Turns out, Claude Code’s memory management is strikingly similar to the wiki-skills system I’d built separately – they’re convergent solutions to the same underlying problem: how does a context-limited agent maintain coherent state over time?
Memory Management — Five Layers
Claude Code has five distinct memory systems, each operating at a different timescale and purpose. Understanding all five is essential for building a robust harness.
Each submitMessage() call:
1. loadMemoryPrompt() → inject relevant memdir memories (age-weighted)
2. fetch CLAUDE.md files → inject as nested_memory attachments (dedup via loadedNestedMemoryPaths)
3. LLM call with full context
4. Tool calls execute (FileStateCache updated on each read)
5. Final response → post-turn:
a. extractMemories (forked agent, async)
b. autoDream check (time + session + lock gates)
c. sessionMemory update
Layer 1: FileStateCache — What the Model Has Seen
Summary. FileStateCache is the model’s working memory for files — it uses an index to track files the model has most recently read. Old reads get evicted when the cache fills up. The isPartialView flag and Cache Eviction both mean the model must re-read before editing, ensuring it never acts on stale or unseen content.
File: src/utils/fileStateCache.ts
An LRU cache (100 entries, 25MB max) tracking every file the model has read this session.
export type FileState = {
content: string
timestamp: number
offset: number | undefined
limit: number | undefined
isPartialView?: boolean // ← the key safety flag
}
The isPartialView flag is the most important field here. It’s set when a file was auto-injected (e.g., CLAUDE.md) but the model only saw a processed version (HTML stripped, frontmatter removed, truncated MEMORY.md). The raw disk bytes are stored in content for diffing, but the model never saw them.
When isPartialView is true, FileEditTool and FileWriteTool require an explicit FileReadTool call first. This prevents the model from issuing edits based on content it’s never actually seen. The type system enforces it.
What this is in plain terms. FileStateCache is the model’s reading history for a session. Every time Claude Code reads a file, that file gets logged here. The isPartialView flag solves a subtle correctness problem: sometimes Claude Code pre-processes a file before showing it to the model (stripping HTML, truncating long files). The raw bytes on disk differ from what the model saw. Without tracking this, the model could issue an edit against content it never actually read — silently corrupting a file. So the type system enforces: if isPartialView is true, you must explicitly re-read the file before editing it.
Path normalization. All keys are normalize()d before storage — relative paths, .. segments, and mixed separators all resolve to the same cache key. This prevents cache misses from cosmetic path differences.
Takeaway: Auto-injection creates a gap between what’s on disk and what the model has seen. That gap must be tracked explicitly — without it, the system cannot know which files are safe to edit and which require a re-read first.
Layer 2: memdir — Persistent Semantic Memory
Summary. memdir is where memories survive across sessions. It’s distinct from CLAUDE.md: CLAUDE.md is static instruction (“what to do”), memdir is dynamically extracted memory (“what was true”). On each query, Sonnet selects up to 5 relevant memories from a metadata manifest — cheap to score, precise to load. Age doesn’t filter memories out; it modulates trust. Old memories still get loaded but with a staleness caveat, preventing stale context from silently overriding recent reality while not discarding potentially useful history.
Files: src/memdir/
The memdir system is a file-based persistent memory directory at ~/.claude/projects/<path>/memory/. It has semantic structure:
findRelevantMemories.ts— LLM-based relevance scoring viasideQuery()(Claude Sonnet)memoryScan.ts— scans memory directory, extracts frontmatter metadata (up to 200 files), sorts newest-firstmemoryAge.ts— injects staleness caveat for memories >1 day old (does not filter them out)memoryTypes.ts— typed memory categories (user, feedback, project, reference)paths.ts— path resolution,isAutoMemoryEnabled()gate
How relevance scoring works. findRelevantMemories() calls selectRelevantMemories(), which sends a manifest of memory metadata (not full content) to Claude Sonnet via sideQuery(). Sonnet receives the user’s query, the manifest (type + filename + timestamp + description), and recently-used tools. It selects up to 5 memories that will “clearly be useful” — selective by design. Sonnet sees the timestamps in the manifest and weighs both semantic relevance and age together. Full content is only loaded for the selected entries. This is progressive discovery: tiny tokens to score, targeted load.
Why descriptions matter. Sonnet reads the one-line description in each memory’s frontmatter to decide relevance. A vague description (“some user info”) won’t score well. Precise descriptions (“user prefers TypeScript, switched from Python in April 2026”) score correctly.
Memories are loaded at query time via loadMemoryPrompt() in QueryEngine.ts. The system prefetches relevant memories during parallel startup and injects them as context.
Auto-memory vs CLAUDE.md. CLAUDE.md is project-level instruction. memdir is project-specific extracted memories from past sessions. They’re injected differently and serve different purposes.
Takeaway: Progressively discover relevant context — maintain a cheap, scannable index with precise descriptions, then load full content only after relevance is confirmed. Keep descriptions specific; vague metadata degrades selection silently. Age modulates trust, not access.
Interpretation: memdir’s architecture is structurally identical to a well-organized wiki:
| memdir | Wiki |
|---|---|
MEMORY.md index |
wiki/index.md |
| One-line description per memory file | One-line summary per [[page]] |
| Sonnet scores metadata manifest to pick relevant files | Query skill scans index to find relevant pages |
| Full file loaded only when selected | Full page read only when needed |
| Cap of 5 files per query | Focused set of pages per session |
memoryTypes.ts — user/feedback/project/reference |
Tag system — patterns/modules/flows/decisions |
The core insight is the same: don’t load everything, load the right things. Keep a cheap scannable index with precise descriptions. Load full content only after relevance is confirmed. The difference is who scores relevance — memdir uses Sonnet via sideQuery(), a wiki uses a human or a query skill. Claude Code essentially built a wiki for itself.
Layer 3: extractMemories — Post-Turn Forked Agent
File: src/services/extractMemories/extractMemories.ts
After each complete query loop (model produces a final response with no tool calls), a forked agent runs to extract durable memories from the session transcript and write them to the memdir.
main loop completes → handleStopHooks → initExtractMemories() fires
↓
runForkedAgent(
prompt: "extract memories from this transcript",
tools: [FileRead, FileWrite, FileEdit, Glob, Grep, Bash]
)
↓
writes to ~/.claude/projects/.../memory/
Key design decisions:
- Uses
runForkedAgent()— shares the parent’s prompt cache, so extraction is nearly free at the cache layer - Closure-scoped state (not module-level) so tests can call
initExtractMemories()inbeforeEachfor fresh state - Gated by
isAutoMemoryEnabled()and GrowthBook feature flag - Supports team memory sync (
TEAMMEMfeature flag) — memories extracted to shared team paths
Wiki analogy. Layer 3 maps to /wiki-ingest — both distill a completed unit of work into durable, structured knowledge. The difference: extractMemories fires automatically after every turn; /wiki-ingest is manual, requiring the human to decide what’s worth capturing. Automatic extraction is consistent but risks noise and loss of nuance; manual extraction has higher signal but requires human in the loop (HITL). Claude Code resolves this by enforcing typed categories (user, feedback, project, reference) to filter conversational noise. Wiki resolves it with human judgment.
Takeaway: Memory must be continuously updated to be useful. Claude Code does this automatically, with typed categories filtering out conversation noise. Wiki updates are more free-form and requires HITL. But the process can be automated once the memory types for a use case are settled — hooks can trigger wiki updates at session end (PostSession hook), and the forked-subagent flow makes the extraction efficient.
Layer 4: autoDream — Background Memory Consolidation
Summary: Fires a consolidation prompt to merge observations, removes contradictions, and converts vague insights into durable facts. A DreamTask entry tracks it in AppState so the UI can show it.
File: src/services/autoDream/autoDream.ts
Applies a three-gate pattern (time → count → lock), a standard short-circuit evaluation pattern, to resource acquisition: filter cheap conditions first, acquire expensive resources last.
Time gate: hours since lastConsolidatedAt >= minHours
↓ (pass)
Session gate: transcript count with mtime > lastConsolidatedAt >= minSessions
↓ (pass)
Lock gate: no other process mid-consolidation (file lock)
↓ (pass)
runForkedAgent("/dream — consolidate memories")
A SESSION_SCAN_INTERVAL_MS = 10 * 60 * 1000 throttle prevents rescanning too often when the time gate passes but session gate doesn’t.
Wiki analogy. Layer 4 maps to /wiki-lint — both sweep for contradictions, stale content, and drift that accumulates across sessions. The difference:
| Claude Code | Your wiki | |
|---|---|---|
| Trigger | Automatic, idle-time | Manual /wiki-lint |
| Mechanism | LLM consolidation via /dream prompt |
Lint report + human judgment |
| Contradiction resolution | Model merges and rewrites | Human decides which version wins |
| Frequency | After enough sessions accumulate | When you remember to run it |
Contradiction accumulation is inevitable at scale. Claude Code automates resolution at the cost of LLM judgment calls you can’t audit. Your wiki keeps humans in the loop at the cost of consistency depending on discipline.
Takeaway: Dream runs on a timer. Wiki lint is manual. A hook or timer would keep the wiki quality high automatically.
Layer 5: Session Memory (SessionMemory service)
Files: src/services/SessionMemory/
A within-session scratch memory separate from cross-session memdir. Used during compaction — truncateSessionMemoryForCompact() preserves a condensed version of the current session’s memory across a compaction boundary, so the model doesn’t lose the current session context entirely.
getLastSummarizedMessageId() / setLastSummarizedMessageId() tracks which messages have been folded into session memory, preventing re-summarization.
Lessons for wiki-skills
Three of the five layer’s management have a direct wiki-skill equivalent, validating the wiki system I’m experimenting with:
| Claude Code Layer | Timescale | Wiki-Skills Equivalent |
|---|---|---|
| FileStateCache — tracks which files the model has read and whether it saw a pre-processed view | Within a tool call | No direct equivalent; wiki doesn’t gate edits on read history |
| memdir — persistent semantic memory, scored by relevance at query time | Across sessions | Markdown vault + MEMORY.md index; /wiki-query loads relevant pages |
| extractMemories — forked agent distills durable memories from the session transcript after each turn | End of turn | /wiki-ingest — same distillation, but manual and human-triggered |
| autoDream — background consolidation pass that merges contradictions and stales old facts | Idle time (multi-session) | /wiki-lint — same sweep, but manual |
| SessionMemory — within-session scratch memory preserved across compaction boundaries | Within a session | Conversation context (not persisted) |
At the same time, this deep-dive into Claude Code’s memory management revealed several weakness in the current Wiki-Skills workflow. memdir has explicit guards against scale pressure that the wiki currently lacks. As the wiki grows, the same failure modes will appear at a slower timescale:
| Failure mode | memdir’s guard | Wiki equivalent needed |
|---|---|---|
| Index bloat | 200 file scan cap, 5 load cap | Hard cap on index entries per category |
| Topic drift | autoDream consolidation | Scheduled /wiki-lint |
| Query noise | Age-weighted scoring in findRelevantMemories |
Recency weighting in /wiki-query |
| Ingest redundancy | Typed categories filter noise | Contradiction sweep on every ingest |
Implications for Wiki-Skills
There’s a deeper implication here beyond structural similarity.
Claude is trained on Claude Code sessions as RL signal. The model has been shaped by the same memory patterns — the index-then-load retrieval, the typed memory categories, the forked-agent extraction flow — that the wiki system replicates. Claude doesn’t just work with these patterns; it has internalized them as the expected shape of a well-run agentic session.
That’s not a coincidence to exploit — it’s a reason the wiki system works as well as it does. When /wiki-query scans an index and loads only relevant pages, Claude recognizes that flow. When /wiki-ingest produces typed, frontmatter-structured memory files, Claude reads them fluently. The format matches what the model was trained to produce and consume.
The wiki system isn’t just inspired by the Claude Code harness; The wiki system is inherently compatible with Claude at the model level.
But this begs the question – how compatible is the Wiki system with other models?
