Why selective forgetting beats compaction (and subagents) for long tasks

June 8, 2026

Give an agent a real task — refactor a module, chase a bug across a dozen files, get a build green — and the thing that eventually stops it isn’t reasoning. It’s the context window filling up with stuff it already finished reading.

The problem: stale tool output piles up

A long run is mostly tool calls: read a file, grep the codebase, run the build. Each one dumps its full output into the conversation — the entire file, the whole grep, the 200-line build log — and most of it goes stale almost immediately: the agent found the one function it cared about and moved on. The dead weight accumulates, and halfway through a real task the window is mostly spent tool output with a thin thread of reasoning running through it. When it fills, the run degrades or simply stops. The work wasn’t too hard — the agent buried itself in its own scrollback.

What Zucchini does: Dynamic Context¹

Zucchini lets the agent forget — on purpose, and selectively. As it works, it can call a prune-context action on its own no-longer-needed tool outputs: that 300-line file it already extracted the one function from, the grep it already read, the build log from three steps ago. The full output is dropped and replaced with a one-line digest of what it learned (“checkedauth.rs, the token refresh is in refresh()”). The reasoning stays; the bulk goes.

You watch it happen. Every prune shows up right in the chat timeline as a small frame — “context pruned · ~12k freed” — so it’s never a black box. The agent decides what’s spent, surgically, byte-for-byte, and keeps going on a leaner context. It works for Claude Code, Codex, Gemini, and Cursor. And a prune isn’t a one-way door: if a dropped output turns out to be needed again, the digest says what it was — the agent just re-reads the file or re-runs the command.

Why it beats compaction

The usual answer to a full window is compaction: when you hit a threshold, rewrite the entire history into a summary and continue from that. It works, but it’s blunt in three ways.

You don’t choose what survives. The spent build log and the chain of reasoning you still need get flattened with equal violence.
Fidelity is lost everywhere. A summary is lossy by construction: the decision you made forty turns ago, the exact error string you’re still hunting — all of it comes back fuzzier.
It fires on a schedule, not on a fact. It triggers when the window crosses a line, not when a specific output actually became useless.

Selective forgetting inverts all three: only the specific spent outputs go, everything else stays at full fidelity, and each prune is visible in the timeline instead of an opaque summarize-and-pray.

Why it beats spawning subagents (for this)

The other stock answer is “just use subagents” — offload chunks of work to fresh contexts so the parent stays light. For the right shape of problem that’s genuinely the best tool, and it’s worth being fair about it: if your task fans out into independent, parallelizable subtasks, subagents win outright. Run them side by side, each in its own clean window.

But one long, coherent task is the wrong shape for that. The whole point of such a task is continuity — the accumulated understanding of how these pieces fit. Hand a slice to a subagent and:

State fragments. The subagent can’t see the main thread’s built-up context, so it re-derives or guesses what the parent already knew.
You pay lossy hand-offs both ways. Summarize the task down into the subagent, summarize its result back up — two compactions, the same fidelity tax, twice.
The parent still bloats anyway. Every returned summary lands back in the parent window, plus orchestration overhead for the privilege.

Selective forgetting keeps a single agent with the full thread of reasoning intact while staying lean. Subagents for parallel fan-out; one lean agent for one long task.

Under the hood — and a feature request

None of the agent CLIs let you edit a live session’s context, so Zucchini does it the hard way. When the agent calls prune-context, the spawner stops the resident process, rewrites the session transcript on disk — each pruned tool output replaced by its one-line digest — and respawns the CLI, resuming from the edited transcript. Rewriting history also invalidates the provider’s prompt cache from that point on (compaction pays the same cost — any rewrite does), which is why prunes are applied in batches rather than one output at a time.

We’d love to retire this machinery. Selective forgetting belongs inside the agents themselves — a first-class “drop this tool result, keep this digest” operation in Claude Code, Codex, Gemini CLI, and friends would make the restart unnecessary and the cache cost smaller. Consider this post a feature request.

¹ We arrived at selective forgetting independently, but we weren’t the first to publish the idea — opencode-dynamic-context-pruning introduced a proof of concept earlier. ↩

Claude and Claude Code are trademarks of Anthropic, PBC. Zucchini is not affiliated with or endorsed by Anthropic.