Blog

Field reports. Engineering writeups. Case studies.

Long-form posts on what AI coding agents actually do under the hood, what they cost, and where structural code memory changes the math. Companion to the research papers — the papers anchor the methodology, the blog tells the operating story. Every post is reproducible: code, prompts, raw logs, and methodology are linked at the bottom.

01Latest

What's new.

The most recent post is at the top. Engineering writeups appear here when the underlying experiment is reproducible end-to-end and the raw data is published alongside the prose. We don't ship marketing essays disguised as field reports.

Field study · 2026-05-06 · NEW

I Spent 30 Days Tracking the Re-Read Tax in My Claude Code Bill. 73% Was Redundant.

A 30-day field measurement on 47 real Claude Code sessions (18,442 tool calls, ~14M input tokens). Bucketed every read_file call by file path; counted re-reads per session. Result: 73% of read tokens were redundant. The longer the session, the worse the tax — re-reads hit 78% on 500+ turn sessions. Methodology, raw data, and the design space of fixes (text memory, embedding memory, graph memory) — including a 10× cost reduction reproduced on the same 1,237-turn session with persistent graph memory.

Read the writeup →
Engineering writeup · 2026-04-27

From 156 Candidates to 3 Worth Reviewing — Security Triage at VS Code Scale

We pointed ArgosBrain's structural security review at Microsoft VS Code (MIT, commit 1fa1b7a): 12,000 files, 151,620 symbols, 25 sink categories. Result: 156 high-severity candidates surfaced, 0 reachable from untrusted input within depth 8, 3 worth manual review in core — in 8 seconds for $0.30. Naïve grep+LLM baseline for the same coverage: $18-$36 and 30-60 minutes. Findings inside proprietary extensions/copilot/* were disclosed privately to Microsoft MSRC, not reproduced here.

Read the writeup →
Field report · 2026-04-25

ArgosBrain on Kubernetes 1.32.0: Two Live Runs of an MCP-Served Code-Memory Engine

A field report on two end-to-end runs against Kubernetes v1.32.0 (17,171 files, 303,722 symbols, 2,245,124 call-graph edges). Run A: a 22-category security audit, $0.33, 70 seconds, zero reachable critical findings. Run B: an architectural code tour deducing the engineering culture in 11 queries, $0.11, 6 seconds. Library gaps publicly disclosed; v0.8.9 ships the fixes.

Read the report →
Engineering writeup · 2026-04-24

Stress-Testing Code Memory at Kubernetes Scale

LongMemCode kubernetes-2k: 1,456 scenarios mined from real Kubernetes v1.32.0 (333 MB, 38 771 symbols, 232 756 call-graph edges) and 100 real bug-fix commits pulled from git history. ArgosBrain hits 99.16% accuracy at 0.404 ms p99 latency and $0 per query — 87 ms total wall-clock for the full suite. Eight honest misses we publish in full. Methodology, every query, every returned symbol public on GitHub for byte-for-byte reproduction.

Read the writeup →
02Looking for research?

Peer-shape papers live next door.

Three CC BY 4.0 pre-prints on code-memory architecture, the structural-versus-semantic retrieval taxonomy, and the LongMemCode benchmark methodology — with full source LaTeX and reproducibility recipes — live at argosbrain.com/papers. The blog is the operating story; the papers are the receipts.