Methodology · Updated 2026-04-25

The math behind every cost claim on this site.

Token-reduction percentages are easy to inflate. We publish two baselines — a naive ceiling and a realistic floor — and show every formula, every assumption, every step so the numbers survive technical scrutiny. If a sceptical engineer wants to demolish a claim from the homepage, they should be able to point at this page and check our work in 5 minutes.

See it on Kubernetes Read the paper Code on GitHub

baselines published

99.97%

reduction vs naive ceiling

97.1%

reduction vs realistic floor

$15 / MTok

Opus 4.7 input price (April 2026)

01Why two baselines

The "99.97% reduction" headline is technically defensible but rhetorically suspect.

The most aggressive number you can put on a marketing page is the comparison against the worst-case approach: load every file in the codebase into the LLM context window. For Kubernetes that's 17,171 files × ~5,000 tokens/file ≈ 86 million tokens, which at Opus 4.7 input pricing ($15 per million tokens) costs $1,287 per query. ArgosBrain serves the same query for $0.33. That's a 99.97% reduction — true, calculable, and meaningless because no agent on earth actually reads every file.

The honest comparison is against a competent agent that does what a senior engineer does: grep first, read selectively, summarise. For a Kubernetes-scale audit a competent agent loads ~150 files × 5K tokens ≈ 750,000 tokens, which costs $11.25. ArgosBrain's $0.33 is a 97.1% reduction against this floor.

Both numbers are real. The 99.97% is the ceiling of the win; the 97.1% is the floor. We publish both because:

The ceiling is what the comparison could claim if you accept the worst-case framing — useful for back-of-envelope sanity-checking ("does this even matter at scale?")
The floor is what the comparison must claim if you reject naive baselines — the number that survives an adversarial reading on Hacker News
Either way, ArgosBrain lives in a different cost regime: the engine fits inside the per-PR CI budget instead of the quarterly-audit budget

If a marketing page anywhere on this site shows only the ceiling number without a link back to this page, that's a bug; please let us know.

02The formulas

Three numbers: naive baseline, realistic baseline, and ArgosBrain.

Naive baseline (the ceiling)

tokens_naive = N_files × T_{per_file}
cost_naive = tokens_naive × P_input / 1,000,000

Where:

N_files

Number of source files in the codebase. Counted via find . -type f -name '*.ext' | wc -l across the language extensions ArgosBrain supports.

T_{per_file}

Average tokens per file. We use 5,000 — a conservative estimate for medium-complexity source files (50-200 lines × ~25 tokens/line including whitespace + comments). Empirically validated against random samples from K8s, ripgrep, and TypeScript stdlib.

P_input

Provider's input-token price per million. Updated quarterly; current numbers in Section 04.

Realistic baseline (the floor)

tokens_realistic = N_{files_read} × T_{per_file} + T_{grep_output}
cost_realistic = tokens_realistic × P_input / 1,000,000

Where:

N_{files_read}

Number of files a competent agent ends up reading after grep narrows the search. Empirically ~150 for K8s-scale security audits (matches what we observe in real Claude Code sessions). Scales sublinearly with codebase size — a 100K-file codebase still requires only ~300-500 file reads if grep patterns are well-tuned.

T_{grep_output}

Tokens consumed by grep results before file selection. We estimate ~50K tokens for a thorough multi-pattern grep pass over a large codebase. (Negligible compared to file-read cost; included for honesty.)

ArgosBrain (the actual measurement)

tokens_argosbrain = Σ tokens_{tool_call} + Σ tokens_{read_back}
cost_argosbrain = tokens_argosbrain × P_input / 1,000,000

Where:

tokens_{tool_call}

Tokens returned by each MCP tool call (search, list_symbols, check_reachability, etc.). Logged by the MCP transport on every call; no estimation needed.

tokens_{read_back}

Tokens for files the agent does read after the brain narrows the candidates. Typically 3-15 files for a security audit, vs ~150 for the realistic baseline.

Reduction

(1 − cost_argosbrain / cost_baseline) × 100%. Computed against both baselines and published as a pair.

03Reproduce on your laptop

10 minutes, two terminals, your own numbers.

Every cost claim on this site reproduces in two MCP sessions against the same codebase. Pick any open-source repo (Kubernetes, ripgrep, your own monolith). Run both flows. Compare.

Terminal 1 — measure the realistic baseline

# Start a fresh Claude Code session in any large repo cd ~/code/kubernetes # DO NOT install ArgosBrain. We're measuring the agent without us. # Ask the same question you'd ask in production: # "Audit every state-changing endpoint that doesn't pass through # auth middleware. Show reachable paths from any HTTP entry." # When the session ends, log the input-token total from your console: # Claude Code shows it in the status bar; otherwise check the API # dashboard for the run's token consumption.

Terminal 2 — measure ArgosBrain

# Sign in at app.argosbrain.com (free, GitHub) → copy your one-liner from the dashboard curl -sSL https://argosbrain.com/install | sh && argosbrain init --key <your-free-key> cd ~/code/kubernetes argosbrain ingest . # Same question, same model, same Claude Code window — but now MCP tools available # Run the question. # When done, the brain logs every tool-call's token cost. # Sum them via: argosbrain status --tokens-this-session

Compute

reduction_realistic = (1 − tokens_argosbrain / tokens_realistic) × 100%
reduction_naive = (1 − tokens_argosbrain / tokens_naive) × 100%

For Kubernetes 1.32 we measured: realistic = 97.1%, naive = 99.97%. Your numbers will vary by codebase shape, query specificity, and the agent's grep skill — that's exactly why we publish a range, not a single point estimate.

Keep us honest. If your reproduction lands materially different numbers, file an issue. We've been wrong before; we'll publish the correction with the same prominence as the original claim.

04Provider pricing reference

Headline costs use Anthropic Opus 4.7 input pricing — the most expensive option in production use.

We deliberately quote against the most expensive frontier model for two reasons: (1) it's what enterprise customers actually run for security-critical reachability proofs, and (2) cheaper models make ArgosBrain look more attractive, so quoting against Opus is the conservative choice. Provider prices change; we update this table when they do.

Provider · Model	Input $/MTok	K8s naive cost	K8s realistic	ArgosBrain
Anthropic · Opus 4.7 (headline)	$15.00	$1,287	$11.25	$0.33
Anthropic · Sonnet 4.7	$3.00	$257	$2.25	$0.07
Anthropic · Haiku 4.7	$0.80	$69	$0.60	$0.02
OpenAI · GPT-4o	$2.50	$215	$1.88	$0.06
Google · Gemini 2.5 Pro	$1.25	$107	$0.94	$0.03

Prices as of 2026-04-25. Output tokens cost more (typically 4-5× input) but are negligible in retrieval-heavy workflows like ArgosBrain — the agent does ~22K input tokens of MCP responses and emits ~2K output tokens of summary, so output cost is <5% of total. ArgosBrain's per-query cost is for the LLM driving the workflow; the brain itself is $0 per query.

05What this methodology doesn't claim

Honest scope statement.

This is the cost of the agent driving the workflow, not the brain. The brain itself returns answers at $0 per query — it's a Rust process on your laptop. The numbers above measure what the LLM consumes on top to reason about those answers.
Token counts are approximate. Real-world variance is ±15% depending on query specificity, agent prompting style, and how aggressively the agent re-reads files. We publish the median; your run will land within a 30% band either side.
Realistic-baseline assumptions are debatable. "150 files for a K8s-scale security audit" is what we observed across 12 reproduction runs by 4 different operators. A more skilled grep-er might do it in 80 files; a less experienced agent might thrash through 400. The point of the realistic baseline isn't precision — it's existence proof that even an optimistic floor is two orders of magnitude above ArgosBrain.
The 5K tokens/file estimate is conservative for some files, generous for others. Test files run smaller; generated code (zz_generated_*.go in K8s) runs larger. Average across a real corpus is within ±20% of 5K.
This methodology applies to retrieval-heavy workflows. If you're using an LLM to generate long-form code (write me a 500-line implementation), token economics shift toward output cost and ArgosBrain's input-side savings matter less. The skills shipped on /skills are all retrieval-heavy by design.
We make no claim about quality of findings. Token reduction without finding-quality parity would be useless. The Kubernetes case study reports specific findings + library gaps so quality is verifiable independently of cost.

06FAQ

Anticipated objections.

"99.97% sounds dishonestly inflated. Why publish it at all?"

It's the upper bound under a clearly-labelled worst-case assumption. We always publish it next to the realistic-baseline number on the same line. Anyone who cites the 99.97% without the 97.1% is misrepresenting our claim — point them here.

"Your '5K tokens per file' is suspect."

Run tiktoken against random samples from your own codebase. We have. Median is between 4,200 and 6,800 depending on language. We use 5K as the round number and disclose the range here.

"Why Opus pricing? Most teams use Sonnet."

Headline cost numbers should reflect the most expensive realistic deployment, not the cheapest. Customers running security-critical reachability proofs run Opus. Section 04 above shows the numbers for Sonnet, Haiku, GPT-4o, and Gemini — at every price point ArgosBrain stays in a fundamentally different regime.

"What about output tokens?"

For retrieval-heavy ArgosBrain workflows, output is <5% of total cost (typically 2K output vs 22K input). We omit it from the headline numbers because it's noise relative to the input-side savings — the same agent producing the same summary on the realistic baseline pays the same output-token cost.

"What does ArgosBrain itself cost?"

$0 per query. The brain is a Rust process running on your laptop or CI runner. No cloud round-trip, no per-query LLM call. The only costs are (1) the LLM that drives the workflow (the numbers above), (2) the one-time ingest cost (CPU + memory for ~30s on a small repo, ~10min on a monolith), and (3) the optional license fee if you're on Pro tier — but the per-query cost is zero either way.

"Will the numbers change?"

Yes. Provider prices drop ~50% per year on average. Our cost stays $0. The reduction percentages will increase over time as LLM input pricing gets cheaper but the file-read economics stay the same. We re-publish the K8s table every quarter.

07Try it

Run the K8s audit on your machine. Compare your tokens.

Sign in with GitHub at app.argosbrain.com — free, no card. Your dashboard hands you the line below with your key already filled in:

curl -fsSL https://argosbrain.com/install | sh && argosbrain init --key <your-free-key>
cd ~/code/kubernetes-1.32
argosbrain ingest .
# In Claude Code: /argos-security
# Then sum the tokens reported in your status bar.
# Compare against the table in Section 04 above.