Long-form posts on what AI coding agents actually do under the hood, what they cost, and where structural code memory changes the math. Companion to the research papers — the papers anchor the methodology, the blog tells the operating story. Every post is reproducible: code, prompts, raw logs, and methodology are linked at the bottom.
The most recent post is at the top. Engineering writeups appear here when the underlying experiment is reproducible end-to-end and the raw data is published alongside the prose. We don't ship marketing essays disguised as field reports.
A 30-day field measurement on 47 real Claude Code sessions (18,442 tool calls, ~14M input tokens). Bucketed every read_file call by file path; counted re-reads per session. Result: 73% of read tokens were redundant. The longer the session, the worse the tax — re-reads hit 78% on 500+ turn sessions. Methodology, raw data, and the design space of fixes (text memory, embedding memory, graph memory) — including a 10× cost reduction reproduced on the same 1,237-turn session with persistent graph memory.
We pointed ArgosBrain's structural security review at Microsoft VS Code (MIT, commit 1fa1b7a): 12,000 files, 151,620 symbols, 25 sink categories. Result: 156 high-severity candidates surfaced, 0 reachable from untrusted input within depth 8, 3 worth manual review in core — in 8 seconds for $0.30. Naïve grep+LLM baseline for the same coverage: $18-$36 and 30-60 minutes. Findings inside proprietary extensions/copilot/* were disclosed privately to Microsoft MSRC, not reproduced here.
A field report on two end-to-end runs against Kubernetes v1.32.0 (17,171 files, 303,722 symbols, 2,245,124 call-graph edges). Run A: a 22-category security audit, $0.33, 70 seconds, zero reachable critical findings. Run B: an architectural code tour deducing the engineering culture in 11 queries, $0.11, 6 seconds. Library gaps publicly disclosed; v0.8.9 ships the fixes.
LongMemCode kubernetes-2k: 1,456 scenarios mined from real Kubernetes v1.32.0 (333 MB, 38 771 symbols, 232 756 call-graph edges) and 100 real bug-fix commits pulled from git history. ArgosBrain hits 99.16% accuracy at 0.404 ms p99 latency and $0 per query — 87 ms total wall-clock for the full suite. Eight honest misses we publish in full. Methodology, every query, every returned symbol public on GitHub for byte-for-byte reproduction.
Three CC BY 4.0 pre-prints on code-memory architecture, the structural-versus-semantic retrieval taxonomy, and the LongMemCode benchmark methodology — with full source LaTeX and reproducibility recipes — live at argosbrain.com/papers. The blog is the operating story; the papers are the receipts.