Token-reduction percentages are easy to inflate. We publish two baselines — a naive ceiling and a realistic floor — and show every formula, every assumption, every step so the numbers survive technical scrutiny. If a sceptical engineer wants to demolish a claim from the homepage, they should be able to point at this page and check our work in 5 minutes.
The most aggressive number you can put on a marketing page is the comparison against the worst-case approach: load every file in the codebase into the LLM context window. For Kubernetes that's 17,171 files × ~5,000 tokens/file ≈ 86 million tokens, which at Opus 4.7 input pricing ($15 per million tokens) costs $1,287 per query. ArgosBrain serves the same query for $0.33. That's a 99.97% reduction — true, calculable, and meaningless because no agent on earth actually reads every file.
The honest comparison is against a competent agent that does what a senior engineer does: grep first, read selectively, summarise. For a Kubernetes-scale audit a competent agent loads ~150 files × 5K tokens ≈ 750,000 tokens, which costs $11.25. ArgosBrain's $0.33 is a 97.1% reduction against this floor.
Both numbers are real. The 99.97% is the ceiling of the win; the 97.1% is the floor. We publish both because:
If a marketing page anywhere on this site shows only the ceiling number without a link back to this page, that's a bug; please let us know.
Where:
find . -type f -name '*.ext' | wc -l across the language extensions ArgosBrain supports.Where:
Where:
Every cost claim on this site reproduces in two MCP sessions against the same codebase. Pick any open-source repo (Kubernetes, ripgrep, your own monolith). Run both flows. Compare.
For Kubernetes 1.32 we measured: realistic = 97.1%, naive = 99.97%. Your numbers will vary by codebase shape, query specificity, and the agent's grep skill — that's exactly why we publish a range, not a single point estimate.
Keep us honest. If your reproduction lands materially different numbers, file an issue. We've been wrong before; we'll publish the correction with the same prominence as the original claim.
We deliberately quote against the most expensive frontier model for two reasons: (1) it's what enterprise customers actually run for security-critical reachability proofs, and (2) cheaper models make ArgosBrain look more attractive, so quoting against Opus is the conservative choice. Provider prices change; we update this table when they do.
| Provider · Model | Input $/MTok | K8s naive cost | K8s realistic | ArgosBrain |
|---|---|---|---|---|
| Anthropic · Opus 4.7 (headline) | $15.00 | $1,287 | $11.25 | $0.33 |
| Anthropic · Sonnet 4.7 | $3.00 | $257 | $2.25 | $0.07 |
| Anthropic · Haiku 4.7 | $0.80 | $69 | $0.60 | $0.02 |
| OpenAI · GPT-4o | $2.50 | $215 | $1.88 | $0.06 |
| Google · Gemini 2.5 Pro | $1.25 | $107 | $0.94 | $0.03 |
Prices as of 2026-04-25. Output tokens cost more (typically 4-5× input) but are negligible in retrieval-heavy workflows like ArgosBrain — the agent does ~22K input tokens of MCP responses and emits ~2K output tokens of summary, so output cost is <5% of total. ArgosBrain's per-query cost is for the LLM driving the workflow; the brain itself is $0 per query.
It's the upper bound under a clearly-labelled worst-case assumption. We always publish it next to the realistic-baseline number on the same line. Anyone who cites the 99.97% without the 97.1% is misrepresenting our claim — point them here.
Run tiktoken against random samples from your own codebase. We have. Median is between 4,200 and 6,800 depending on language. We use 5K as the round number and disclose the range here.
Headline cost numbers should reflect the most expensive realistic deployment, not the cheapest. Customers running security-critical reachability proofs run Opus. Section 04 above shows the numbers for Sonnet, Haiku, GPT-4o, and Gemini — at every price point ArgosBrain stays in a fundamentally different regime.
For retrieval-heavy ArgosBrain workflows, output is <5% of total cost (typically 2K output vs 22K input). We omit it from the headline numbers because it's noise relative to the input-side savings — the same agent producing the same summary on the realistic baseline pays the same output-token cost.
$0 per query. The brain is a Rust process running on your laptop or CI runner. No cloud round-trip, no per-query LLM call. The only costs are (1) the LLM that drives the workflow (the numbers above), (2) the one-time ingest cost (CPU + memory for ~30s on a small repo, ~10min on a monolith), and (3) the optional license fee if you're on Pro tier — but the per-query cost is zero either way.
Yes. Provider prices drop ~50% per year on average. Our cost stays $0. The reduction percentages will increase over time as LLM input pricing gets cheaper but the file-read economics stay the same. We re-publish the K8s table every quarter.
curl -fsSL https://argosbrain.com/install | sh cd ~/code/kubernetes-1.32 argosbrain ingest . # In Claude Code: /argos-security # Then sum the tokens reported in your status bar. # Compare against the table in Section 04 above.