Engineering writeup · 2026-04-27

From 156 Candidates to 3 Worth Reviewing — Security Triage at VS Code Scale

We pointed ArgosBrain's structural security review at Microsoft VS Code (github.com/microsoft/vscode, MIT-licensed, commit 1fa1b7a). 12,000 files, 151,620 symbols, 25 sink categories. Result: 156 high-severity candidates surfaced, 0 reachable from untrusted input within depth 8, 3 worth manual review in core, in 8 seconds for $0.30. Naïve grep+LLM baseline for the same coverage: $18-$36 and 30-60 minutes. The article publishes the three MIT-core findings with file:line. Findings inside proprietary extensions/copilot/* were disclosed privately to Microsoft MSRC and are not reproduced here.

Reproducible · MIT corpus

Every aggregate number in this writeup regenerates from the public microsoft/vscode @ 1fa1b7a. Install ArgosBrain, ingest the repo, run /argos-security-reviewer, and get the same buckets within ±2% (small variance from optional dependencies and minor library updates between scans).

VS Code commit →  ·  Install ArgosBrain →  ·  Engine source →

Why we ran this

VS Code is the most-used code editor on Earth. Microsoft ships it as MIT open source, the repo carries roughly twelve thousand source files across TypeScript, JavaScript, CSS, and a Rust CLI, and it has been fork-bombed into Cursor, Windsurf, Trae, and a half-dozen other AI-IDE derivatives. It is the natural successor to our Kubernetes-scale stress test: same "industrial codebase" target, but in TypeScript, with a different sink surface, and with a known-hard mix of security-relevant patterns (inter-process renderers, shell-command spawns, fetch URL plumbing, WebView innerHTML sinks).

We did not run this scan to find vulnerabilities. Microsoft has a dedicated security team, an active bug bounty, and at least four other static-analysis tools running in CI. We ran it to test what ArgosBrain is actually for: separating "scary candidates" from "actually reachable from untrusted input" at industrial scale, in seconds, for cents.

The corpus

Numbers, all reproducible by ingesting the same commit:

The result, in one chart

SEVERITY DISTRIBUTION (25 SINK CATEGORIES) Critical 0 High 156 Medium 180+ Reachable 0 / ~340 sinks confirmed reachable from untrusted input within depth 8 In MIT core 3 worth manual review (cited below with file:line)

The "high" bucket comes from heuristic exploit-score signals (caller fanout, argument shape) without a proven untrusted source. The reachability pass then walks the call graph from each sink up to depth 8, looking for tainted source markers (HTTP handler input, CLI args, file content, env var read). Of the ~340 sinks scanned across all categories with hits, zero were structurally reachable from such a source within depth 8.

Where the 156 "high" came from (the noise breakdown)

The honest part of running a security tool is showing the false-positive profile. Almost none of these 156 are actionable. The triage report classified them at the time of scan:

Category High Why "high" but not actionable
Insecure random58Math.random() in tests, animations, sampling — not used for tokens, sessions, or keys
SSRF42fetch() with hardcoded service endpoints, mostly in proprietary Copilot extension (disclosed privately to MSRC, see below)
XSS19.innerHTML in trusted-source renderers (notebook cell output, ghost-text). Includes a hit on the bundled dompurify.js file itself — the sanitiser, not a sink
Cloud API key16Mailgun-style key- prefix matched non-secret identifiers (secretStorageKeyPath, settings keys, charCode enums). The ghp_…, AKIA…, sk-ant-… hits were all in secretFilter.spec.ts — intentional fixtures for the secret-redaction filter test
Unsafe Rust16All in cli/src/ — Win32 API bindings, glibc version probe, file metadata. unsafe is required for FFI, and each block has a SAFETY: comment
Prototype pollution2__proto__ references in config merge() and debug glue — likely guards rather than write paths
TLS verification disabled1rejectUnauthorized: false in a proxy debug helper (Copilot extension)
Hardcoded secret1accessToken: 'gho_mock_e2e_test_token_…' in an end-to-end mock auth fixture, explicitly named e2e-mock
Weak crypto1MD5/SHA1 in an external ingest client — almost certainly content-addressing, not auth
TOTAL1560 confirmed reachable from untrusted input

The triage value is the second column. A naïve grep -rn 'innerHTML' surfaces every one of the 19 XSS hits flat, including the sanitiser file. ArgosBrain emits the same finding but adds the structural context that lets a human (or an agent) say "DOMPurify is a sanitiser callsite, not a sink — discard". Same for the secret-filter test fixtures and the FFI unsafe blocks.

The three MIT-core candidates worth a look

Three findings landed in src/ (MIT-licensed VS Code core). Reachability says zero of them have an untrusted-source path within depth 8, but the patterns are interesting enough that we publish them here for the next person doing a manual review. None of these are vulnerability claims — they are starting points for a human reviewer who can confirm taint flow with field-level analysis tools (Semgrep Pro, CodeQL).

Pattern File / range Why ArgosBrain flagged it
SSRF candidate extHostMcp.ts:331-878
McpHTTPHandle
Outbound HTTP wrapper. URL host is parameterised; SSRF risk only if host is user-controllable across a process boundary. Worth verifying the input chain.
XSS candidate webviewPreloads.ts
notebook renderer prelude
3,000+ lines of WebView prelude with multiple .innerHTML sites. Notebook output is normally trusted (kernel-controlled), but the surface is large enough to merit a sanitisation audit.
Prototype pollution configuration.ts:337-351
config merge()
__proto__ referenced inside a recursive merge. If the function does not reject __proto__ as a key, untrusted JSON config could mutate Object.prototype. Worth a unit test that asserts the guard.

Each link points at the exact commit so the line numbers stay valid even as the file evolves on main. Microsoft's CodeQL pipeline almost certainly already covers these patterns; the value of running ArgosBrain separately is the speed of triage when an outsider wants a same-day answer to "is this codebase critically vulnerable?" without paying for a SAST seat.

What we are not publishing — the responsible-disclosure boundary

Five named high-severity surfaces in the full report were inside extensions/copilot/* (three SSRF candidates in HTTP API clients, one TLS-verification-disabled in a logging proxy helper, one weak-crypto candidate in a workspace-search ingest client). That code ships with the proprietary GitHub Copilot extension, not with VS Code OSS — it is Microsoft IP under a separate licence.

Publishing file:line excerpts from proprietary code, even when our findings are candidates rather than confirmed exploits, is the kind of move that gets a security writeup pulled. So we did the responsible thing:

This is the boundary that distinguishes "marketing-with-a-fig-leaf" from "responsible engineering writeup". ArgosBrain's strongest pitch is "we make security review fast and cheap", and that pitch only survives if the tool's operators behave like a reviewer, not a pentester-on-a-clout-arc.

Cost and wall-clock

SECURITY PASS — ARGOSBRAIN VS NAIVE GREP+LLM BASELINE ArgosBrain $0.30 · 8s · 36 MCP calls · 25 sink categories scanned Naive grep+LLM $18 - $36 · 30-60 min · ~6-12M tokens · same coverage, larger noise floor Delta ~98% cheaper · ~340× faster · same surface, structured triage Token math: 95k tokens at Opus 4.7 prompt rate vs 6-12M tokens streaming whole files for pattern matching.

The cost story matters because the alternative on a 12,000-file codebase is not actually free. A naïve approach — grep for every dangerous pattern, dump the matched files into an LLM, ask it to triage — burns 6-12 million tokens for the kind of sweep ArgosBrain does in 95,000 tokens. At Opus 4.7's prompt rate, that is the difference between thirty cents and forty dollars. Multiplied by every CI run, every PR, every release branch, the gap compounds.

Reproduce it yourself

# 1. Install ArgosBrain (Free tier, no credit card)
curl -fsSL https://argosbrain.com/install | sh

# 2. Clone VS Code at the same commit
git clone https://github.com/microsoft/vscode.git
cd vscode
git checkout 1fa1b7af5c190606cdd5e8fe5e5f1ca4fad47e00

# 3. Initialise ArgosBrain on this project
argosbrain init

# 4. From your Cursor / Claude Code chat, run:
/argos-security-reviewer

# Expected: the 0/156/180+ severity buckets, ±2% small variance.
# Wall-clock 8-12 s on M-series laptops. Cost ~$0.30 in tokens.

What this scan does and does not measure

The structural reachability pass measures call-graph paths from sources (HTTP handler input, CLI argv, file content, env vars) to sinks (the dangerous patterns enumerated in the corpus section). It does not measure:

Try it

One command, sixty seconds:

curl -fsSL https://argosbrain.com/install | sh

Free tier ships with every retrieval feature, every sink scanner, and every skill in the catalogue — for one active project at a time, no node cap. Upgrade to Pro ($19/month) for unlimited active projects. Pricing on the homepage.


Authors: ArgosBrain Team · Date: 2026-04-27 · License: CC BY 4.0 · Corpus: microsoft/vscode @ 1fa1b7a (MIT) · Disclosure: findings inside extensions/copilot/* submitted privately to Microsoft Security Response Center