The Verdict — ArgosBrain

Tested at VS Code scale · 2026-04-27

Triage 156 security candidates in 8 seconds.

We pointed ArgosBrain's structural security review at Microsoft VS Code (microsoft/vscode, MIT, commit 1fa1b7a) — ~12,000 files, 151,620 symbols, 25 sink categories scanned in one pass. 156 high-severity candidates surfaced, 0 reachable from untrusted input within depth 8, 3 worth a manual review in MIT core. Total cost: $0.30 in 8 seconds. Naïve grep+LLM baseline: $18-$36 in 30-60 minutes.

151,620

symbols ingested

~12k files, TypeScript-dominant

8 s

wall-clock

25 sink categories, end-to-end

$0.30

total cost

98% cheaper than grep+LLM

156 → 0

candidates triaged

high-severity → reachable from untrusted input

The triage value

grep -rn 'innerHTML' surfaces every XSS hit flat — including DOMPurify itself, the sanitiser. ArgosBrain's structural pass adds the call-graph context that lets a reviewer (or an agent) discard the sanitiser callsite, the test-fixtures, and the FFI bindings in seconds.

Responsible disclosure boundary

5 named high-severity surfaces landed in proprietary extensions/copilot/* — three SSRF candidates, one TLS-disable, one weak-crypto. We submitted those to Microsoft MSRC privately rather than publishing file:line. The aggregate metrics include them; the named examples are MIT core only.

Reproducible at the same SHA

Pin VS Code at 1fa1b7a, install ArgosBrain, run /argos-security-reviewer. Buckets regenerate within ±2%. No proprietary data, no dataset, no judge — same code, same brain, same numbers.

We do not run this scan to find vulnerabilities — Microsoft has a security team, a bug bounty, and at least four other static-analysis tools in CI. We run it to test what ArgosBrain is for: separating "scary candidates" from "actually reachable" at industrial scale, in seconds, for cents. The full writeup publishes the three MIT-core candidates with file:line for any reviewer who wants to follow up.

FULL WRITEUP → VS CODE @ 1fa1b7a → INSTALL + REPRODUCE →

Where ArgosBrain is demonstrably #1

With evidence

Claim	Evidence
Cheapest retrieval path	$0/query. Every competitor either injects prompt tokens (Cursor, Windsurf, Copilot, Continue, Cline, Aider, Roo, CLAUDE.md) or spends LLM calls (Letta on reads). ArgosBrain is free on the graph precision path — no embedding hop required for symbol queries.
Fastest retrieval at coding-agent scale	LongMemCode: P99 ≤ 0.82 ms across 16 corpora. LangMem community benchmark: p95 59.82s. No other memory system publishes comparable numbers.
Only memory engine with a code-specific benchmark	LongMemCode — MIT, 20 corpora, ~8–10k scenarios, 9 categories, deterministic scoring (no LLM judge). No competitor has published on it. Mem0/Zep excel on LoCoMo/DMR — general-memory benchmarks; neither has run LongMemCode.
Only memory with symbol-precision primitives	`symbol_exists`, `resolve_member`, `naming_convention`, `list_symbols`, `check_name` — deterministic, zero-token. Nobody else ships this API. Aider's repo map is the closest (tree-sitter + PageRank) but stateless and surface-only.
Compiler-grade code understanding, tiered per language	Nobody else in this category ingests at semantic depth. Aider's 130+ languages are tree-sitter surface — no cross-file resolution, no types, no overrides. Continue's tree-sitter extracts text chunks for embedding. ArgosBrain picks per language: SCIP (Sourcegraph's production protocol — real compiler frontend) where a mature indexer exists, live LSP next, bespoke tree-sitter drivers with semantic hooks (dotted names, assignment-as-function, per-statement SQL parsing, annotation drilling) otherwise. Head of the distribution — ~95% of code actually written — runs compiler-grade.
Only in-process Rust / zero-external-DB memory graph	Graphiti needs Neo4j/FalkorDB/Kuzu/Neptune. Zep Cloud needs Zep. LangMem needs a store. ArgosBrain runs in the MCP process with bincode on disk.
File-hash refactor safety	Deterministic invalidation on content hash. Copilot does citation validation (closest cousin, 28-day expiry). Everyone else: stale until human edits.

Where we're at parity

We match but don't beat

Axis	Who matches us	Honest note
Cross-session persistence	Everyone except Aider	Table stakes, not a moat.
Multi-project isolation	Mem0 (per-agent), Zep (per-user), CLAUDE.md (per-dir hash)	We're first-class but not unique.
Local-first	Windsurf, Zed, Cline, Aider, Roo, Continue, CLAUDE.md	Popular posture, not differentiation by itself.
Open source	Continue, Cline, Aider, Roo, Mem0, Letta, Graphiti, LangMem, MCP memory	Most of the category is OSS. ArgosBrain is not — engine is commercial; benchmark (LongMemCode) is MIT.

Where we lose today

Honest

Axis	Winner	What it means
IDE-native UX	Cursor, Copilot, Windsurf	Zero-install in the editor most users are in. ArgosBrain is an MCP server you configure.
Managed cloud / team sync	Mem0 Cloud, Zep Cloud, Letta Cloud	Multi-user team memory with SLAs. Local-first; team sync is roadmap.
Agent framework + visual debugger	Letta (ADE)	We're a memory engine, not an agent platform — by design, but worth naming.
General conversational memory benchmarks	Mem0 (LoCoMo 91.6%, LME 93.4%), Zep (DMR 94.8%)	We target match, not beat, on LongMemEval (≥91.6% floor). Code is our moat; chat isn't.
Bi-temporal reasoning	Graphiti, Zep	Richer temporal ontology than our zone + age-category model. We'll learn from them.
Installed base / community size	Cursor, Copilot, Mem0, Cline, Aider	We're new. Market position won't reverse on a quarter.
Long-tail exotic-language coverage	Aider (130+ tree-sitter grammars, surface only)	At the long tail — COBOL, Forth, Ada, Pony — they have surface name extraction where we have nothing. Trade-off: same rows where Aider cannot answer `resolve_member` or who-overrides-X.
Prompt-optimization (procedural memory)	LangMem	A category we don't ship.

More on engineering trade-offs in the FAQ.

Per-segment recommendation

The honest friend answer

Solo dev in Cursor use Cursor Memories. Add ArgosBrain if you also use Aider / Claude Code or your code can't leave the machine.
Team in GitHub use Copilot Memory. Add ArgosBrain if you need local-first or cross-agent.
Claude Code primary add ArgosBrain. Strongest fit today — MCP support + in-process engine + auto-memory at ~/.claude/projects/.
Aider user add ArgosBrain via MCP. Aider explicitly disclaims persistent memory; we fill the gap exactly.
Cline / Roo user ArgosBrain replaces the Memory Bank convention once your repo outgrows MD prose. Until then, Memory Bank is fine.
Conversational agent use Mem0 or Zep. ArgosBrain isn't for you yet.
Building on LangGraph use LangMem unless retrieval latency matters.
Need team cloud today use Mem0 Cloud or Zep. Come back when team sync ships.

Bottom line

We win outright on: cost-per-query (vs all), latency (vs all with published numbers), code-specific benchmark (first one exists), symbol precision (unique API), compiler-grade tiered indexing (unique), in-process / no-external-DB (unique among graph engines), file-hash staleness (closest analog is Copilot's 28-day timer).

We tie on: persistence, local-first — table stakes. We lose on open-source-of-the-engine: ours is commercial. We make up for it by publishing the benchmark (MIT) so anyone can verify the numbers.

We lose today on: IDE-native UX, managed cloud, agent framework, conversational-memory benchmarks, long-tail exotic-language coverage, installed base.

If your agent writes code in a language at the head of the distribution (where ~95% of code actually lives), nothing else we've found retrieves faster, cheaper, or at greater semantic depth than ArgosBrain. Everything outside that sentence is someone else's fight.

That's the claim we defend in public. Narrow. Truthful.

Triage 156 security candidates in 8 seconds.

Is ArgosBrainactually the best?

Where ArgosBrain is demonstrably #1

Where we're at parity

Where we lose today

Per-segment recommendation

Is ArgosBrain
actually the best?