A Forcing Function for AI Memory

I ended the last post on a tradeoff I hadn’t solved: the graph grows. The session-end hook persists context automatically, which is the whole point — but persistence without bounds means the files only get longer. A 2000-line context file is a tax. An LLM reading it at the start of every session is slower, more expensive, and measurably worse at finding the thing that matters.

A system that maintains its own memory eventually needs to maintain its maintenance. Here’s the part I left out.

The Problem With Self-Writing Memory

Two failure modes, both structural.

Unbounded growth. Every session appends. Decisions, session summaries, graph nodes — they accumulate. Nothing removes them, because the writing agent’s job is to write, not to garden. Left alone, the working set grows until it’s actively degrading the thing it’s supposed to help.

No natural lifecycle. Old content doesn’t know it’s old. A decision from three months ago sits in the same file as one from yesterday, weighted equally, until something decides otherwise. In a flat system, that “something” is me, manually. Which means it doesn’t happen.

The obvious fix is a size limit enforced at commit time: lint the files, fail the commit if they’re too big. This is wrong, and the reason it’s wrong is the whole point of the design.

Don’t Gate. Signal.

The commit is how a session’s memory gets saved. The session-end hook writes the new context, then commits and pushes. If a lint fails that commit when files are over budget, you’ve built a system that throws away the session’s memory in order to protect the memory. The failure mode is exactly backwards.

So the lint never blocks. It always exits 0. From the top of the script:

The forcing function is the memory-steward agent in the session-end hook, not the commit gate — auto-commits never block.

All it does is count lines against budgets and, on overage, drop a marker file:

declare -a BUDGETS=(
  "compass/sessions.md:500"
  "compass/decisions.md:400"
  "compass/context/index.dot:800"
  "compass/self/index.dot:600"
)

When something is over, it writes compass/.budget-pressure.json:

{
  "generated_at": "2026-05-24T14:30:00Z",
  "violations": [
    { "file": "compass/sessions.md", "lines": 523,
      "budget": 500, "oldest_promotable_date": "2026-04-27" }
  ]
}

That’s it. The lint detects; it doesn’t remediate. Decoupling those two is what lets the commit always succeed while still creating pressure to clean up. The marker is a tripwire, not a verdict.

The Janitor

A separate agent — the steward — runs at session end, but only if the marker file exists. No pressure, no steward. When it does run, its job is one thing: promote the oldest content out of working memory and into a quarterly archive, until the files are back under budget.

The operative word is promote. Memory moves down a tier; it’s never lost.

Short-term (working memory). Recent raw signal: the session log, the active context graph. This is what gets read at session start, so this is what has budgets.
Mid-term (consolidated). Distilled history: quarterly session files, archived decisions, collapsed graph subgraphs. The steward writes here.
Long-term (evergreen). Goals, preferences, references. Hand-curated. The steward never touches it.

When the session log goes over budget, the steward doesn’t just truncate the bottom — it takes the weeks that have aged out and rewrites each one as a single distilled paragraph in the quarterly archive, then deletes the day-by-day entries from the working file. The detail drops to a tier nobody reads at session start; the gist survives where history lives.

Promotion With Judgment

The part that makes this more than a tail-and-truncate script is that the steward reasons about what’s safe to move.

Take the decision log. The naive rule is “archive anything older than 90 days.” But age isn’t relevance. A decision made four months ago might still be the thing constraining active work today. So before promoting a decision, the steward cross-checks the context graph: it finds the node that decision produced and looks at its status.

If the node is still active or blocked, the decision stays in working memory — open work needs its rationale close at hand, regardless of age. Only when the node is resolved, abandoned, or superseded does the decision get archived.

That judgment is why the forcing function is an agent and not a rule. “Is this still live?” isn’t a line count. It’s a question you have to read the graph to answer — and encoding it as a threshold would be a worse version of asking a model that can just read the graph and tell you.

The Whole Loop

Stripped to its shape, every session ends like this:

writers     append the session's memory, in parallel
   ↓
lint        recount lines → drop a marker if anything is over budget
   ↓
steward     runs only if the marker exists → promote oldest content
   ↓
commit      rebase + retry, behind a single-instance lock

The writers are capped at a dollar each, and the whole thing sits behind a flock so concurrent session-ends queue instead of racing the git index. It runs after I close the terminal. I never see it happen — I see the effects: a context file that stays under 800 lines no matter how busy the quarter gets, and a session that starts sharp instead of wading through three months of residue.

Why This Shape

Three decisions did most of the work.

Detect and remediate separately. The lint is dumb and never blocks; the steward is smart and runs conditionally. A dumb tripwire firing a smart cleanup beats one clever gate that risks the commit it’s attached to.

Move, don’t delete. Every promotion is a transfer to a lower tier. The working set shrinks; the history is intact. The system can always trace deep when it needs to — it just doesn’t carry that weight into every session.

Put judgment in an agent, thresholds in a script. Line counts decide when to act. An agent reading the graph decides what to act on. The cheap, deterministic check triggers the expensive, reasoned one — and only when it’s actually needed.

Tradeoffs

Opacity compounds. I already don’t read the context graph. Now there’s a janitor rewriting it on its own judgment, too. It’s agents maintaining agents’ memory. The mitigation is the same as it’s always been — plain text, in git, every change diffable — but the surface I’d have to audit to catch a quiet misjudgment keeps growing.

The thresholds are guesses. 500 lines, 90 days, three weeks. I tuned them by feel, not measurement. They’re probably wrong by some margin I haven’t bothered to find — though “wrong” here just means the steward runs a little more or less often than ideal, which is a cheap kind of wrong.

It’s another thing that can silently stop. A flat file you edit by hand has no failure mode. This has a hook, a lint, and a conditional agent, any of which can break without announcing it. If the steward quietly stops running, nothing fails — the files just start growing again, and I won’t notice until a session feels sluggish.

The Principle

The series has had one throughline: plain text, simple conventions, AI that maintains its own state. This is the next turn of it. A system that writes its own memory will, without intervention, bury itself in that memory. You can’t solve that by writing less — the volume is doing real work. A forcing function solves it instead: a signal that costs nothing and blocks nothing, paired with a janitor that has enough judgment to act on it.

The building blocks haven’t changed. The new idea is small and, in hindsight, obvious: self-maintaining memory needs a self-maintaining cleanup — one that reasons about what to keep instead of blindly truncating.