From 156 Candidates to 3 Worth Reviewing — Security Triage at VS Code Scale

Why we ran this

VS Code is the most-used code editor on Earth. Microsoft ships it as MIT open source, the repo carries roughly twelve thousand source files across TypeScript, JavaScript, CSS, and a Rust CLI, and it has been fork-bombed into Cursor, Windsurf, Trae, and a half-dozen other AI-IDE derivatives. It is the natural successor to our Kubernetes-scale stress test: same "industrial codebase" target, but in TypeScript, with a different sink surface, and with a known-hard mix of security-relevant patterns (inter-process renderers, shell-command spawns, fetch URL plumbing, WebView innerHTML sinks).

We did not run this scan to find vulnerabilities. Microsoft has a dedicated security team, an active bug bounty, and at least four other static-analysis tools running in CI. We ran it to test what ArgosBrain is actually for: separating "scary candidates" from "actually reachable from untrusted input" at industrial scale, in seconds, for cents.

The corpus

Numbers, all reproducible by ingesting the same commit:

github.com/microsoft/vscode @ 1fa1b7af5c190606cdd5e8fe5e5f1ca4fad47e00 (main, 2026-04-25)
~12,000 source files (TypeScript dominant, with CSS, Rust in cli/, JS in renderer prelude)
151,620 symbols ingested into the brain
25 sink categories scanned in this pass: SSRF, XSS, SQLi, RCE, command injection, path traversal, deserialisation, XXE, LDAP, open redirect, buffer overflow, regex DoS, timing attack, crypto IV reuse, weak crypto, hardcoded secret, cloud API key, private key block, prototype pollution, TLS verification disabled, insecure random, unsafe Rust, CORS wildcard, cookie insecure, JWT none

The result, in one chart

The "high" bucket comes from heuristic exploit-score signals (caller fanout, argument shape) without a proven untrusted source. The reachability pass then walks the call graph from each sink up to depth 8, looking for tainted source markers (HTTP handler input, CLI args, file content, env var read). Of the ~340 sinks scanned across all categories with hits, zero were structurally reachable from such a source within depth 8.

Where the 156 "high" came from (the noise breakdown)

The honest part of running a security tool is showing the false-positive profile. Almost none of these 156 are actionable. The triage report classified them at the time of scan:

Category	High	Why "high" but not actionable
Insecure random	58	`Math.random()` in tests, animations, sampling — not used for tokens, sessions, or keys
SSRF	42	`fetch()` with hardcoded service endpoints, mostly in proprietary Copilot extension (disclosed privately to MSRC, see below)
XSS	19	`.innerHTML` in trusted-source renderers (notebook cell output, ghost-text). Includes a hit on the bundled `dompurify.js` file itself — the sanitiser, not a sink
Cloud API key	16	Mailgun-style `key-` prefix matched non-secret identifiers (`secretStorageKeyPath`, settings keys, charCode enums). The `ghp_…`, `AKIA…`, `sk-ant-…` hits were all in `secretFilter.spec.ts` — intentional fixtures for the secret-redaction filter test
Unsafe Rust	16	All in `cli/src/` — Win32 API bindings, glibc version probe, file metadata. `unsafe` is required for FFI, and each block has a `SAFETY:` comment
Prototype pollution	2	`__proto__` references in config `merge()` and debug glue — likely guards rather than write paths
TLS verification disabled	1	`rejectUnauthorized: false` in a proxy debug helper (Copilot extension)
Hardcoded secret	1	`accessToken: 'gho_mock_e2e_test_token_…'` in an end-to-end mock auth fixture, explicitly named `e2e-mock`
Weak crypto	1	MD5/SHA1 in an external ingest client — almost certainly content-addressing, not auth
TOTAL	156	0 confirmed reachable from untrusted input

The triage value is the second column. A naïve grep -rn 'innerHTML' surfaces every one of the 19 XSS hits flat, including the sanitiser file. ArgosBrain emits the same finding but adds the structural context that lets a human (or an agent) say "DOMPurify is a sanitiser callsite, not a sink — discard". Same for the secret-filter test fixtures and the FFI unsafe blocks.

The three MIT-core candidates worth a look

Three findings landed in src/ (MIT-licensed VS Code core). Reachability says zero of them have an untrusted-source path within depth 8, but the patterns are interesting enough that we publish them here for the next person doing a manual review. None of these are vulnerability claims — they are starting points for a human reviewer who can confirm taint flow with field-level analysis tools (Semgrep Pro, CodeQL).

Pattern	File / range	Why ArgosBrain flagged it
SSRF candidate	extHostMcp.ts:331-878 `McpHTTPHandle`	Outbound HTTP wrapper. URL host is parameterised; SSRF risk only if host is user-controllable across a process boundary. Worth verifying the input chain.
XSS candidate	webviewPreloads.ts notebook renderer prelude	3,000+ lines of WebView prelude with multiple `.innerHTML` sites. Notebook output is normally trusted (kernel-controlled), but the surface is large enough to merit a sanitisation audit.
Prototype pollution	configuration.ts:337-351 config `merge()`	`__proto__` referenced inside a recursive merge. If the function does not reject `__proto__` as a key, untrusted JSON config could mutate `Object.prototype`. Worth a unit test that asserts the guard.

Each link points at the exact commit so the line numbers stay valid even as the file evolves on main. Microsoft's CodeQL pipeline almost certainly already covers these patterns; the value of running ArgosBrain separately is the speed of triage when an outsider wants a same-day answer to "is this codebase critically vulnerable?" without paying for a SAST seat.

What we are not publishing — the responsible-disclosure boundary

Five named high-severity surfaces in the full report were inside extensions/copilot/* (three SSRF candidates in HTTP API clients, one TLS-verification-disabled in a logging proxy helper, one weak-crypto candidate in a workspace-search ingest client). That code ships with the proprietary GitHub Copilot extension, not with VS Code OSS — it is Microsoft IP under a separate licence.

Publishing file:line excerpts from proprietary code, even when our findings are candidates rather than confirmed exploits, is the kind of move that gets a security writeup pulled. So we did the responsible thing:

The five Copilot-extension findings are being submitted privately to the Microsoft Security Response Center.
The aggregate metrics in this writeup (151,620 symbols, 156 high, $0.30, 8 seconds) include those findings — they are part of the scan.
We do not name the files, do not link them on GitHub, and do not excerpt them. If MSRC concludes any of them are real, Microsoft will publish; we will not pre-empt that.

This is the boundary that distinguishes "marketing-with-a-fig-leaf" from "responsible engineering writeup". ArgosBrain's strongest pitch is "we make security review fast and cheap", and that pitch only survives if the tool's operators behave like a reviewer, not a pentester-on-a-clout-arc.

Cost and wall-clock

The cost story matters because the alternative on a 12,000-file codebase is not actually free. A naïve approach — grep for every dangerous pattern, dump the matched files into an LLM, ask it to triage — burns 6-12 million tokens for the kind of sweep ArgosBrain does in 95,000 tokens. At Opus 4.7's prompt rate, that is the difference between thirty cents and forty dollars. Multiplied by every CI run, every PR, every release branch, the gap compounds.

Reproduce it yourself

# 1. Install ArgosBrain (Free tier, no credit card)
curl -fsSL https://argosbrain.com/install | sh

# 2. Clone VS Code at the same commit
git clone https://github.com/microsoft/vscode.git
cd vscode
git checkout 1fa1b7af5c190606cdd5e8fe5e5f1ca4fad47e00

# 3. Initialise ArgosBrain on this project
argosbrain init

# 4. From your Cursor / Claude Code chat, run:
/argos-security-reviewer

# Expected: the 0/156/180+ severity buckets, ±2% small variance.
# Wall-clock 8-12 s on M-series laptops. Cost ~$0.30 in tokens.

What this scan does and does not measure

The structural reachability pass measures call-graph paths from sources (HTTP handler input, CLI argv, file content, env vars) to sinks (the dangerous patterns enumerated in the corpus section). It does not measure:

Field-level taint flow. A reachable call graph does not prove tainted data flows along it — a sanitiser on the path may neutralise the risk. Pair with Semgrep Pro or CodeQL for that.
Dynamic dispatch / reflection. Code paths that resolve at runtime (DI containers, eval-constructed callables, monkey-patched modules) are invisible to the static call graph.
Cross-process trust boundaries. VS Code's renderer / extension host / main process split is a real authorisation boundary; the scan treats it as one graph.
Confirmed exploits. Every "high" in this report is a candidate. The value is the speed of narrowing the candidates, not declaring real CVEs.

Try it

One command, sixty seconds:

curl -fsSL https://argosbrain.com/install | sh

Free tier ships with every retrieval feature, every sink scanner, and every skill in the catalogue — for one active project at a time, no node cap. Upgrade to Pro ($19/month) for unlimited active projects. Pricing on the homepage.

Authors: ArgosBrain Team · Date: 2026-04-27 · License: CC BY 4.0 · Corpus: microsoft/vscode @ 1fa1b7a (MIT) · Disclosure: findings inside extensions/copilot/* submitted privately to Microsoft Security Response Center