← All blog posts
FIELD STUDY · AI CODING AGENTS

I Spent 30 Days Tracking the Re-Read Tax in My Claude Code Bill. 73% Was Redundant.

A field measurement of what AI coding agents actually spend tokens on — and a working framework for cutting it.

Where a $4.20 Claude Code session actually goes — 73% re-reads, 27% productive

A month ago I noticed something strange in my Claude Code bill.

I was running long coding sessions on a 400-file Next.js project — real production work, multi-file refactors, the usual. The bills were creeping up faster than I expected. So I did what every engineer does when a number doesn't match their model: I added logging.

What I found surprised me.

73% of the tokens I was paying for weren't reasoning. They weren't tool calls. They were the agent re-reading files it had already read minutes earlier.

This piece is the methodology, the data, and the design space of fixes — written for anyone whose Claude Code or Cursor bill has started looking less like a tool subscription and more like a tax.


The setup: how I logged the re-reads

You can replicate this on any Claude Code session. Here's what I did.

For 30 days I instrumented every coding session by piping Claude Code's tool-call logs through a small wrapper that captured:

{
  "turn": 47,
  "tool": "read_file",
  "path": "src/lib/auth.ts",
  "tokens_in": 3214,
  "tokens_out": 0,
  "session_id": "...",
  "timestamp": "..."
}

Across 30 days I logged 47 sessions ranging from quick bug-fix runs (~30 turns) to long architectural refactors (1,000+ turns). Total: 18,442 tool calls, ~14 million input tokens.

Then I bucketed every read_file call by file path and counted how many times the same path was read inside the same session.

The result is the "re-read percentage": of all tokens spent reading files, what fraction was redundant — i.e., the agent had already read that file in a previous turn of the same session.

If you want to do this yourself, the wrapper is ~80 lines of shell + jq. The harder part is staying disciplined enough to run every session through it for a month.


The data

Three findings stood out.

1. The longer the session, the worse the re-read tax

On short sessions (<50 turns), re-reads were ~25% of read tokens. Annoying but tolerable.

On long sessions (500+ turns), re-reads were 78%. The agent was, on average, reading every relevant file 4–5 times per session.

Aggregate across all 47 sessions: 73% of all read-tokens were re-reads. Weighted toward long sessions because those are where most of the spend is.

Re-read tokens as % of all read tokens by session length
Re-read tokens as a percentage of all read tokens, bucketed by session length. The pattern is structural — the longer the agent runs, the more it forgets and re-reads.

2. The same handful of files got read over and over

In one 1,237-turn session on the Next.js project, the agent read src/lib/auth.ts eleven times. It read prisma/schema.prisma nine times. It read next.config.mjs seven times. None of these files changed during the session.

Top files re-read in one session
The top 8 most-re-read files in a single 1,237-turn session. Eight files accounted for 62% of all read tokens — and seven of them were configuration or type definitions that didn't change once.

The agent wasn't being stupid. As context filled up, it would lose track of what it already knew, and at the next turn it would re-grep, re-read, and re-load the same file to "remember."

This is structural. There's no prompt-engineering fix for it.

3. The cost adds up faster than you think

For the 1,237-turn session, the math worked out to:

BucketTokensCost
Total in-session input1.4M$4.20
Re-read tokens (73%)~1M$3.07
New / productive tokens~400k$1.13

That's $3 of pure redundancy on a single session.

Multiply by 5 sessions a day, 22 working days = $330/month per developer that's just paying for forgotten context.

Annualized, across team sizes, the picture gets sharper:

Annual re-read tax by team size
Annual re-read tax projected by team size. A 25-engineer team spends ~$99k/year on the agent re-loading context it already had. That's not the tool's fault — it's the structural memory gap, multiplied by people.

If your team has 10 engineers on Claude Code or Cursor Pro, that's $40k/year of re-read tax. Possibly more — that calculation assumes only one model. Switching to Opus for harder reasoning multiplies the bill by 5–7x on the same redundancy.


Why it happens (and why prompt engineering won't fix it)

This isn't a bug in Claude Code or Cursor. It's how every current-generation agent works without a persistent memory layer.

The architecture, simplified:

  1. Turn N starts. Agent has a fresh context window with the system prompt and the last few turns of conversation.
  2. Agent needs to know about a function. It calls read_file or grep to find it. Reads ~3k tokens.
  3. Turn N+1. Some of that context falls out as the conversation continues. Agent finishes the immediate task.
  4. Turn N+47. Agent gets a related question. The earlier read has long since dropped out of context. Agent re-reads the same file. ~3k more tokens. Same content.

The agent has no persistent memory between turns. The chat history is the memory, and the chat history has a hard limit. So at some point — usually around turn 50 — every read becomes a potential re-read.

You can patch this with CLAUDE.md or .cursorrules, but those are flat text files. They tell the agent about your project ("this codebase uses Prisma, our auth is in src/lib/auth.ts"), not its actual contents. The agent still has to read the file when it needs the actual code.

Bigger context windows don't fix it either. A 100M-token context window doesn't mean the agent reasons over 100M tokens — it means the file you read three turns ago might still be visible. Helpful but not a structural fix, because (a) bigger windows are slower and more expensive per query, and (b) the agent still hallucinates symbols when the relevant file is buried 80M tokens deep in context.

The real fix is to give the agent persistent, queryable memory of your codebase that lives outside the chat window.


The fix space: three approaches

There are three architectural approaches to giving an agent persistent code memory. Each has real trade-offs, and the right answer depends on your codebase and your tolerance for setup.

The three architectural approaches
Three architectural approaches to persistent code memory for AI agents. None is universally best — the right choice depends on codebase size and query mix.

Approach 1 — Text-based memory (CLAUDE.md, .cursorrules, "memory bank" patterns)

The simplest version. You write a text file describing your project's structure and stuff it into every prompt.

Pros: zero setup, works today, no extra tooling.

Cons: doesn't store actual code. Just metadata. The agent still has to read the file when it wants the function body. Re-read tax barely moves.

When it makes sense: small codebases (<50 files) where the file map matters more than the file contents.

Approach 2 — Embedding-based memory (vector search over your code)

Index every file (or chunk of file) as an embedding. At query time, embed the question, retrieve the top-K matching chunks. This is what most RAG-style code-memory tools do.

Pros: fuzzy queries work — "find code that handles user auth" returns relevant chunks even if "auth" isn't in the function name. Setup is moderate.

Cons: every query costs embedding API calls (real money — typical pricing is $0.02–$0.10 per million tokens for the embedding model itself). Indexing a 100k-LOC codebase ranges from ~$50 to ~$900 just for the embedding pass, depending on which model you pick. Retrieval is probabilistic — you sometimes get the wrong chunk back, which leads the agent to confidently reason over the wrong code. And you need a vector database running somewhere.

When it makes sense: when fuzzy semantic search dominates your workflow and you don't mind the per-query cost.

Approach 3 — Graph-based memory (deterministic symbol index)

Parse the codebase into an abstract syntax tree (AST), extract every symbol — function, class, import, call site — and store it as a graph. At query time, the agent asks deterministic questions: "does function X exist?", "where is X called from?", "what does X import?" The answer is a sub-millisecond graph lookup, not a probabilistic search.

Pros: zero per-query cost (just graph traversal). Deterministic — the agent gets a definitive yes/no instead of a maybe. Hallucinated symbols drop to zero because the agent can verify a symbol exists before using it. Indexing once gives you reliable answers forever (until the code changes).

Cons: doesn't handle fuzzy semantic queries as well as embeddings. Requires the parser to support every language in your codebase. The index has to update when code changes (mostly automatic with file-watchers, but a real engineering concern).

When it makes sense: large codebases where most agent queries are structural (find this function, who calls this, does X exist) rather than fuzzy (find code that does X-ish thing).

In practice, you can combine approaches. Some tools layer embeddings on top of graph indexes for fuzzy queries, falling back to deterministic lookups for structural ones. There's no single right answer.


What the math looks like with persistent memory

I rebuilt the same 1,237-turn Next.js session with a persistent graph-based memory layer running in the background. Same agent. Same prompts. Same model. Same task.

The difference:

Before vs after persistent memory
Same task, same agent, same model — twice. Persistent graph memory cut read calls 9.8×, input tokens 5.8×, hallucinated symbols to zero, and per-task cost 10×.

That's a 10x reduction on this run. The hallucinations going to zero matters as much as the cost — every hallucinated symbol cost me debugging time downstream.

Caveats: this is one session, on one repo, with one task profile. The reduction varies. On simple bug fixes (single-file changes), the gain is smaller — sometimes negligible — because there isn't much re-reading to eliminate. On long architectural sessions across many files, it's bigger.

I'm publishing 30 of these comparisons next month with full methodology and raw logs. The headline finding (~70% reduction on long multi-file sessions) seems to hold across repos, but I want enough data points before I claim it as universal.


How to test this on your own bill

If you're paying for Claude Code or Cursor, here's the cheapest, most honest version of this experiment you can run:

  1. Pick one real session you ran this week. A long one — 200+ turns ideally.
  2. Pull the tool-call log from your Claude Code session history (it's in ~/.claude/sessions/ for Claude Code).
  3. Count read_file calls per file path. If any file appears more than twice, those reads after the first one are your re-read tax.
  4. Multiply. Average re-read tokens per file × times re-read = your tax for that session. Annualize it for a sanity check.

I'd bet your number is between 50% and 80%. Mine averaged 73%. DM me yours — I'm collecting data points for the longer benchmark.


What I built

I'm Aurelian, founder of ArgosBrain. We took the graph-based approach: SCIP + LSP + tree-sitter fused into one local Rust binary. Sub-50ms symbol lookups, $0/query, branch-aware so it survives git checkout. MCP-native, so it plugs into Claude Code, Cursor, Codex and any MCP-compatible agent.

We've benchmarked it on Kubernetes (~2M LOC) and VS Code-scale repos because that's where the existing tools tend to break. Free tier indexes one project, no card required — argosbrain.com if you want to try it on your codebase.

But the bigger point isn't ArgosBrain. It's that the re-read tax is real, structural, and not solved by bigger context windows or better prompts. If you're paying for Claude Code or Cursor at any volume, this number is hiding in your bill. Find it. Fix it with whichever tool fits your workflow.

I'd love to hear what your re-read percentage looks like. Reply, DM me on X, or email [email protected]. I'll post the aggregated numbers (anonymized) once I've collected ~50 data points.