AI Dungeon Master

Memory and Continuity: The Hard Problem for AI DMs

Context windows are not campaign bibles

Dave August 19, 2025 2 min read

Session ten is where AI campaigns live or die. Early sessions feel magical because the model fills gaps with genre tropes. Later sessions expose contradictions: the sword you looted never existed, the queen you spared orders your arrest without reference to mercy. Continuity is the hard problem — harder than voice, harder than art, harder than prose quality.

Why context windows fail

Even hundred-thousand-token windows are not reliable archives. Models attend unevenly — middle messages drop priority. Pasting fifty pages of logs each session is expensive, slow, and still does not guarantee the model cites the right paragraph. Summarization helps but summaries compress away details players care about — which inn you promised to protect, which gem is fake.

Structured memory beats longer prompts

Split memory into layers: immutable house rules, evolving campaign summary, character sheets with current HP, NPC registry with revealed flags, tonight-only instructions. Load layers explicitly before generation. dungeonmaster.website campaign brief follows this pattern — the AI reads structured fields, not a scroll of chat.

Player responsibilities

After each session, someone updates summary in plain language: three bullets of facts, not vibes. "Party owes the Fisher guild 200gp. Baron is alive in cell beneath manor. Ritual clock: two of four segments filled." AI cannot maintain continuity players refuse to document.

Retrieval and tools

Lexicon lookup, character queries, and session log search reduce hallucination. When the player asks about a spell, fetch SRD text. When they ask about an NPC, fetch campaign record. RAG without game state is still incomplete — but better than pure generation.

Comparison to human DMs

Human DMs forget too — but social accountability catches errors. "We saved that family in session two" triggers human memory. AI needs external records. Do not expect the model to care about continuity; expect your tooling to enforce it.

Continuity is a database problem wearing a narrative costume.

When evaluating AI GM tools, ask: where does session ten live? If the answer is "the chat history," keep looking.

Dave