June 8, 2026
Give an agent a real task — refactor a module, chase a bug across a dozen files, get a build green — and the thing that eventually stops it isn’t reasoning. It’s the context window filling up with stuff it already finished reading.
A long run is mostly tool calls. The agent reads a file, greps the codebase, runs the build, reads the next file. Every one of those dumps its full output back into the conversation: the entire file, the whole grep, the 200-line build log. And here’s the thing — most of it goes stale almost immediately. The agent skimmed the file, found the one function it cared about, and moved on. The other 300 lines are now dead weight sitting in the window, costing tokens on every single turn that follows.
They accumulate fast. Halfway through a real task the window is most tool output and a thin thread of actual reasoning running through it. When it fills, the run degrades — the model starts losing the early thread — or it simply stops. The work wasn’t too hard. The agent just buried itself in its own scrollback.
Zucchini lets the agent forget — on purpose, and selectively. As it works, it can call a prune-context action on its own no-longer-needed tool outputs: that 300-line file it already extracted the one function from, the grep it already read, the build log from three steps ago. The full output is dropped and replaced with a one-line digest of what it learned (“checkedauth.rs, the token refresh is in refresh()”). The reasoning stays; the bulk goes.
You watch it happen. Every prune shows up right in the chat timeline as a small frame — “context pruned · ~12k freed” — so it’s never a black box. The agent decides what’s spent, surgically, byte-for-byte, and keeps going on a leaner context. It works for Claude Code, Codex, Gemini, and Cursor.
The usual answer to a full window is compaction: when you hit a threshold, rewrite the entire history into a summary and continue from that. It works, but it’s blunt in three ways.
Selective forgetting inverts all three. It removes only the specific spent outputs and leaves the reasoning, the decisions, and recent state at full fidelity — nothing important is rewritten, because nothing important is touched. And because you see each prune in the timeline, you know exactly what was dropped instead of getting an opaque summarize-and-pray.
The other stock answer is “just use subagents” — offload chunks of work to fresh contexts so the parent stays light. For the right shape of problem that’s genuinely the best tool, and it’s worth being fair about it: if your task fans out into independent, parallelizable subtasks, subagents win outright. Run them side by side, each in its own clean window.
But one long, coherent task is the wrong shape for that. The whole point of such a task is continuity — the accumulated understanding of how these pieces fit. Hand a slice to a subagent and:
Selective forgetting keeps a single agent with the full thread of reasoning intact while staying lean — continuity without the bloat. No hand-off, no fragmentation, no re-deriving what it already knew. Subagents for parallel fan-out; one lean agent for one long task.
This is the reason a chat in Zucchini can run long without grinding to a halt. The agent manages its own memory as it goes, and you watch it do exactly that — each “context pruned” frame in the timeline is the agent making room to keep going, in plain sight. One coherent conversation, as long as the task needs, kept lean by the agent itself. That’s Zucchini.
Claude and Claude Code are trademarks of Anthropic, PBC. Zucchini is not affiliated with or endorsed by Anthropic.