Tested at VS Code scale · 2026-04-27

Triage 156 security candidates in 8 seconds.

We pointed ArgosBrain's structural security review at Microsoft VS Code (microsoft/vscode, MIT, commit 1fa1b7a) — ~12,000 files, 151,620 symbols, 25 sink categories scanned in one pass. 156 high-severity candidates surfaced, 0 reachable from untrusted input within depth 8, 3 worth a manual review in MIT core. Total cost: $0.30 in 8 seconds. Naïve grep+LLM baseline: $18-$36 in 30-60 minutes.

151,620
symbols ingested
~12k files, TypeScript-dominant
8 s
wall-clock
25 sink categories, end-to-end
$0.30
total cost
98% cheaper than grep+LLM
156 → 0
candidates triaged
high-severity → reachable from untrusted input
The triage value
grep -rn 'innerHTML' surfaces every XSS hit flat — including DOMPurify itself, the sanitiser. ArgosBrain's structural pass adds the call-graph context that lets a reviewer (or an agent) discard the sanitiser callsite, the test-fixtures, and the FFI bindings in seconds.
Responsible disclosure boundary
5 named high-severity surfaces landed in proprietary extensions/copilot/* — three SSRF candidates, one TLS-disable, one weak-crypto. We submitted those to Microsoft MSRC privately rather than publishing file:line. The aggregate metrics include them; the named examples are MIT core only.
Reproducible at the same SHA
Pin VS Code at 1fa1b7a, install ArgosBrain, run /argos-security-reviewer. Buckets regenerate within ±2%. No proprietary data, no dataset, no judge — same code, same brain, same numbers.

We do not run this scan to find vulnerabilities — Microsoft has a security team, a bug bounty, and at least four other static-analysis tools in CI. We run it to test what ArgosBrain is for: separating "scary candidates" from "actually reachable" at industrial scale, in seconds, for cents. The full writeup publishes the three MIT-core candidates with file:line for any reviewer who wants to follow up.

FULL WRITEUP → VS CODE @ 1fa1b7a → INSTALL + REPRODUCE →
The honest answer

Is ArgosBrain
actually the best?

We built this page because "we're the best at everything" is a lie, and engineers smell it instantly. Here's what we actually win, what we tie, what we lose, and what we'd recommend to a friend.

Where ArgosBrain is demonstrably #1

With evidence
ClaimEvidence
Cheapest retrieval path$0/query. Every competitor either injects prompt tokens (Cursor, Windsurf, Copilot, Continue, Cline, Aider, Roo, CLAUDE.md) or spends LLM calls (Letta on reads). ArgosBrain is free on the graph precision path — no embedding hop required for symbol queries.
Fastest retrieval at coding-agent scaleLongMemCode: P99 ≤ 0.82 ms across 16 corpora. LangMem community benchmark: p95 59.82s. No other memory system publishes comparable numbers.
Only memory engine with a code-specific benchmarkLongMemCode — MIT, 20 corpora, ~8–10k scenarios, 9 categories, deterministic scoring (no LLM judge). No competitor has published on it. Mem0/Zep excel on LoCoMo/DMR — general-memory benchmarks; neither has run LongMemCode.
Only memory with symbol-precision primitivessymbol_exists, resolve_member, naming_convention, list_symbols, check_name — deterministic, zero-token. Nobody else ships this API. Aider's repo map is the closest (tree-sitter + PageRank) but stateless and surface-only.
Compiler-grade code understanding, tiered per languageNobody else in this category ingests at semantic depth. Aider's 130+ languages are tree-sitter surface — no cross-file resolution, no types, no overrides. Continue's tree-sitter extracts text chunks for embedding. ArgosBrain picks per language: SCIP (Sourcegraph's production protocol — real compiler frontend) where a mature indexer exists, live LSP next, bespoke tree-sitter drivers with semantic hooks (dotted names, assignment-as-function, per-statement SQL parsing, annotation drilling) otherwise. Head of the distribution — ~95% of code actually written — runs compiler-grade.
Only in-process Rust / zero-external-DB memory graphGraphiti needs Neo4j/FalkorDB/Kuzu/Neptune. Zep Cloud needs Zep. LangMem needs a store. ArgosBrain runs in the MCP process with bincode on disk.
File-hash refactor safetyDeterministic invalidation on content hash. Copilot does citation validation (closest cousin, 28-day expiry). Everyone else: stale until human edits.

Where we're at parity

We match but don't beat
AxisWho matches usHonest note
Cross-session persistenceEveryone except AiderTable stakes, not a moat.
Multi-project isolationMem0 (per-agent), Zep (per-user), CLAUDE.md (per-dir hash)We're first-class but not unique.
Local-firstWindsurf, Zed, Cline, Aider, Roo, Continue, CLAUDE.mdPopular posture, not differentiation by itself.
Open sourceContinue, Cline, Aider, Roo, Mem0, Letta, Graphiti, LangMem, MCP memoryMost of the category is OSS. ArgosBrain is not — engine is commercial; benchmark (LongMemCode) is MIT.

Where we lose today

Honest
AxisWinnerWhat it means
IDE-native UXCursor, Copilot, WindsurfZero-install in the editor most users are in. ArgosBrain is an MCP server you configure.
Managed cloud / team syncMem0 Cloud, Zep Cloud, Letta CloudMulti-user team memory with SLAs. Local-first; team sync is roadmap.
Agent framework + visual debuggerLetta (ADE)We're a memory engine, not an agent platform — by design, but worth naming.
General conversational memory benchmarksMem0 (LoCoMo 91.6%, LME 93.4%), Zep (DMR 94.8%)We target match, not beat, on LongMemEval (≥91.6% floor). Code is our moat; chat isn't.
Bi-temporal reasoningGraphiti, ZepRicher temporal ontology than our zone + age-category model. We'll learn from them.
Installed base / community sizeCursor, Copilot, Mem0, Cline, AiderWe're new. Market position won't reverse on a quarter.
Long-tail exotic-language coverageAider (130+ tree-sitter grammars, surface only)At the long tail — COBOL, Forth, Ada, Pony — they have surface name extraction where we have nothing. Trade-off: same rows where Aider cannot answer resolve_member or who-overrides-X.
Prompt-optimization (procedural memory)LangMemA category we don't ship.

More on engineering trade-offs in the FAQ.

Per-segment recommendation

The honest friend answer
Bottom line

We win outright on: cost-per-query (vs all), latency (vs all with published numbers), code-specific benchmark (first one exists), symbol precision (unique API), compiler-grade tiered indexing (unique), in-process / no-external-DB (unique among graph engines), file-hash staleness (closest analog is Copilot's 28-day timer).

We tie on: persistence, local-first — table stakes. We lose on open-source-of-the-engine: ours is commercial. We make up for it by publishing the benchmark (MIT) so anyone can verify the numbers.

We lose today on: IDE-native UX, managed cloud, agent framework, conversational-memory benchmarks, long-tail exotic-language coverage, installed base.

If your agent writes code in a language at the head of the distribution (where ~95% of code actually lives), nothing else we've found retrieves faster, cheaper, or at greater semantic depth than ArgosBrain. Everything outside that sentence is someone else's fight.

That's the claim we defend in public. Narrow. Truthful.