A Causal Graph as AI Memory

In a previous post, I described using a context.md file as the AI’s working memory — a flat list of open threads split into “In Flight,” “Blocked,” and “Needs Attention.” Each entry was a one-liner: what it is plus current status.

It worked for about two months. Then it stopped working.

Where Flat Context Breaks

The problem isn’t that a markdown list can’t hold enough items. It’s that it can’t express how things are connected.

Say you notice code quality problems across the team. That observation spawns several threads: a CI pipeline overhaul, a hiring push for a backend engineer, structured 1:1s, and a decision to restrict an AI auto-fix tool that was generating bad PRs. The CI pipeline enables trunk-based development. The hire leads to onboarding. The 1:1s surface a performance issue with one developer, and give another the space to build a bug triage skill. The AI auto-fix restriction evolves into a proper agent-ticket integration.

In context.md, these are separate bullet points:

## In Flight
- CI pipeline — live, main is default branch
- Backend engineer — hired, onboarding in progress
- 1:1s — running across team
- Dev A — performance improvement plan
- Bug triage skill — config-vs-code classifier
- Project management migration — agent integration next
...

Every session, the AI reads this list and knows what’s happening. But it doesn’t know why. It can’t see that the CI pipeline exists because of quality problems, that the bug triage skill emerged from the 1:1s with one developer, or that the project management migration is what unblocks the agent-ticket integration. The backstory gets re-explained in conversations. The file doesn’t carry it.

The second problem is decay. Flat lists don’t have a natural cleanup lifecycle. Resolved items sit there until someone removes them. “Someone” in practice means me, and I have better things to do than gardening a markdown file. So stale items linger, and the AI treats them as active.

The Graph

The replacement is a directed graph in DOT format. One file — compass/context/index.dot — maintained entirely by the AI. I never edit it.

Nodes are inflection points. Not routine progress, not daily updates — moments where something actually changed: a decision was made, something shipped, a thread got blocked, or a new workstream spawned from an existing one.

n_20260115_quality_problems [
    label="Code quality problems observed across team",
    timestamp="2026-01-15",
    status="resolved",
    detail="Sloppy code getting through, bugs in production.
            Triggered multiple improvement tracks."
]
 
n_20260129_ci_decision [
    label="Decided: trunk-based dev for new product,
            keep old branching model for legacy",
    timestamp="2026-01-29",
    status="resolved",
    detail="New product needs fast iteration.
            Legacy needs stability."
]
 
n_20260205_ci_live [
    label="CI pipeline live — main is default branch",
    timestamp="2026-02-05",
    status="resolved",
    detail="Develop branch deleted.
            Features must work when merged."
]

Edges encode causation:

n_20260115_quality_problems -> n_20260129_ci_decision [type="led_to"]
n_20260129_ci_decision -> n_20260205_ci_live [type="led_to"]
n_20260205_ci_live -> n_20260223_stacked_prs [type="enabled"]

Five edge types cover most relationships: led_to, spawned, enabled, blocked_by, invalidated. Subgraphs cluster related nodes into workstreams. Cross-thread edges connect workstreams when they interact — a hire in one thread enables a capability in another.

The convention that makes it work: leaves are the current state. A node with no outgoing edges to other active nodes is where things stand right now. Trace backwards from any leaf, and you get the causal chain of how it got there.

What Gets a Node

Not everything. This was the hardest convention to get right.

A node is an inflection point — a moment that changed the trajectory of a workstream. Specifically:

A decision was made
Something shipped or went live
A thread got blocked or unblocked
A new thread was spawned from an existing one
New information changed the picture

“Made progress on feature X” is not a node. “Feature X shipped” is. “Had a good conversation with Y” is not a node. “Y accepted the role change” is. The graph captures the shape of what happened, not the texture.

The AI makes this judgment call on its own, and it’s right about 90% of the time. The other 10% is usually over-persisting — creating a node for something that was really just routine progress. That’s the easier error to correct.

Pruning and Archiving

This is where the flat-list decay problem gets solved.

When all leaves of a subgraph are resolved or abandoned, and the subgraph is older than roughly three weeks, it gets collapsed: the full graph moves to an archive file, and a single summary node stays in the index.

n_archive_agent_pipeline [
    label="Agent pipeline — proven and working in production",
    timestamp="2026-02-08 to 2026-03-19",
    status="resolved",
    detail="Research, state machine, QA agent.
            Pipeline working hands-off.
            See compass/context/archive/agent-pipeline.dot"
]

The archive is still there if the AI needs to trace deep history. But the working graph stays focused on what’s active. Currently I have eight archived subgraphs compressed into summary nodes, and the index stays under 300 lines.

The cleanup is natural — it follows from the status conventions rather than requiring manual gardening.

Why DOT

People ask why not JSON or YAML. A few reasons.

DOT is scannable. Open the file and the structure is immediately visible — nodes, edges, subgraphs, labels. JSON would be a wall of nested objects. YAML would be fragile about indentation.

Claude reasons well about DOT. It can parse the graph structure, follow edges, identify leaves, and produce valid updates without a schema definition or parsing library. The format is simple enough that an LLM handles it natively.

It renders. If you ever want to visualize the graph, dot -Tpng index.dot -o graph.png works. I rarely do this, but it’s useful for debugging when subgraphs get tangled.

And it has just enough structure without being rigid. Nodes have attributes (label, timestamp, status, detail) but the format doesn’t enforce a schema. If I need a new attribute, I add it. No migrations.

What It Enables

The graph changes what the AI can do at session start. Instead of reading a flat list and knowing what’s happening, it reads a causal narrative and knows the story.

This matters most for advice and planning. When I ask about a workstream, the AI doesn’t just report status — it can explain the causal chain, point to decisions that constrained the current state, and notice when threads that should be connected aren’t.

It also makes session handoffs sharper. The previous post described context.md as “briefing notes for an incoming shift.” The graph is more like handing over a case file — not just the current situation, but the chain of events that produced it.

Two specific things it catches that flat files miss:

Stale threads. A node sitting at active status with no recent children is visibly stale — there’s been no inflection point in weeks. The graph makes this structural rather than requiring someone to notice a dusty bullet point.

Cross-thread dependencies. When one workstream blocks or enables another, that edge exists in the graph. The AI can surface it: “the project management migration you’ve been putting off is what unblocks the bug triage workflow.” In a flat list, those are just two unrelated bullet points.

Tradeoffs

It’s an AI-only file. I don’t read or edit the graph. This is by design — the AI maintains it as part of session wrap-up, and I review the effects (better session starts, better advice) rather than the artifact. But it means the memory system is opaque to me. If the AI misrepresents something, I might not notice until it gives bad advice.

Inflection point judgment is imperfect. The AI tends to over-persist rather than under-persist. A weekly conversation that was really just a status check sometimes gets a node. The fix is easy — delete or merge the node — but it requires noticing.

DOT gets big. Without the archiving lifecycle, the graph would grow indefinitely. Even with archiving, a busy quarter can produce a 400-line index. The subgraph structure keeps it navigable, but there’s a complexity ceiling.

It requires the session-end hook. The graph doesn’t maintain itself — it’s updated by a background agent that runs after each session. If that hook fails or isn’t configured, the graph goes stale. The system has a dependency on infrastructure that a markdown file doesn’t.

The Evolution

The progression was: no context → flat markdown list → causal graph. Each step solved a real problem with the previous approach.

The flat list solved cold starts — AI stopped asking “what are you working on?” every session. The graph solved narrative loss — AI stopped asking “why are you doing it this way?” and “what happened with X?”

Whether you need the graph depends on how many concurrent threads you’re managing and how long they live. If you’re tracking three things over two weeks, context.md is fine. If you’re tracking twenty things over three months with causal dependencies between them, the flat file will eventually fail you the same way it failed me.

The building blocks haven’t changed from the original system: plain text, simple conventions, full transparency. The graph is just a better data structure for the job.