Mining Claude Code and Codex Logs Into a Knowledge Base

Download printable cheat-sheet (CC-BY 4.0)

10 May 2026, 00:00 Z

TL;DR Agent transcripts are useful evidence, but they are not a knowledge base by themselves. The practical pattern is a promotion ladder: raw logs become session cards, session cards become a searchable index, repeated lessons become learning candidates, and only reviewed rules become team knowledge.

The problem: agent work disappears

If you use Claude Code, Codex, Cursor, or any serious coding agent every day, you have probably seen this failure mode:

an agent spends two hours debugging a tool quirk
the fix works
the reasoning stays inside the chat transcript
another session repeats the same mistake a week later

The team did not forget because nobody cared. It forgot because the knowledge never left the transcript.

That is the real problem. Modern coding agents leave behind a lot of raw evidence: messages, tool calls, commands, file edits, errors, summaries, retries, and decisions. Claude Code stores local session transcripts as JSONL files under project-specific directories. Codex has its own session and thread state. Other agent tools have similar traces.

But a pile of logs is not memory.

Memory is what happens after the team can answer:

what did we learn
where is the evidence
when should this change future behaviour
which rule should the next agent actually follow

Why raw transcript search is not enough

The obvious first attempt is to grep old transcripts. That works for emergencies, but it does not scale as a daily workflow.

Raw transcripts are noisy:

tool output dominates the useful conversation
failed commands and retries create duplicate surface area
temporary theories appear next to final conclusions
long logs make small decisions hard to find
local paths, tokens, and environment details can create privacy risk

They are also hard to trust.

If a transcript says "this is fixed", that is not the same as a durable rule. Maybe the agent was wrong. Maybe the user corrected it later. Maybe the fix was only true for one repo, one date, or one provider version.

The goal is not to preserve every word. The goal is to preserve the minimum useful evidence that helps the next session avoid repeating work.

The five-layer knowledge ladder

A practical agent knowledge base has five layers.

1. Raw logs

Raw logs are the evidence layer.

They answer:

what was said
what tools ran

Field	Why it matters
Task	Helps future search match intent, not only filenames.
Repo or project	Prevents a local rule from leaking into every codebase.
Date	Makes stale provider behaviour easier to detect.
Tools used	Surfaces broken MCPs, flaky browsers, and repeat failure zones.
Files touched	Connects reasoning to actual artifacts.
Outcome	States what changed or what was learned.
Decisions	Captures judgment, not just activity.
Failure modes	Prevents repeated dead ends.
Evidence	Keeps the card auditable.

Mining Claude Code and Codex Logs Into a Knowledge Base

The problem: agent work disappears

Why raw transcript search is not enough

The five-layer knowledge ladder

1. Raw logs

Turn AI video into a repeatable engine

2. Session cards

3. Search index

4. Learning candidates

5. Promoted knowledge

A concrete example: the analytics property trap

What a session card should include

What not to index

How to extract useful knowledge

Pass 1: summarize the session

Pass 2: classify operational signals

Pass 3: extract candidate lessons

Promotion rules

1. Is it repeatable?

2. Is it scoped?

3. Is it verified?

4. Is it worth interrupting future agents?

A minimal implementation

Common traps

Summarizing the summarizer

Indexing secrets

Treating every agent conclusion as truth

Creating duplicate docs

Missing provenance

Letting stale docs outrank fresh evidence

Where this fits in an AI production stack

Sources checked

Related Posts

The problem: agent work disappears

Why raw transcript search is not enough

The five-layer knowledge ladder

1. Raw logs

Turn AI video into a repeatable engine

2. Session cards

3. Search index

4. Learning candidates

5. Promoted knowledge

A concrete example: the analytics property trap

What a session card should include

What not to index

How to extract useful knowledge

Pass 1: summarize the session

Pass 2: classify operational signals

Pass 3: extract candidate lessons

Promotion rules

1. Is it repeatable?

2. Is it scoped?

3. Is it verified?

4. Is it worth interrupting future agents?

A minimal implementation

Common traps

Summarizing the summarizer

Indexing secrets

Treating every agent conclusion as truth

Creating duplicate docs

Missing provenance

Letting stale docs outrank fresh evidence

Where this fits in an AI production stack

Sources checked

Related Posts

AI Video Anchor Frames: First and Last Frame Continuity Playbook

YouTube Shorts for AI-Generated Content - Rules, Monetization, and What Gets Flagged

Voice Cloning on a 24GB GPU - What Actually Works in 2026