A local, deterministic, MCP-native memory layer for autonomous agents. Drop-in for LangChain, OpenAI Agent SDK, Claude Agent SDK, Cline, Cursor, custom agents — anything that speaks Model Context Protocol. Sub-millisecond retrieval. ~150× token reduction on codebase queries. Free tier: unlimited projects, 20k nodes per project.
Build an agent that operates on a real codebase and you'll watch it burn 5,000–50,000 tokens for each "find where X is defined" or "what calls Y." Not because the question is hard — because the standard answer is grep + read + LLM-summarize, and your agent does it on every turn.
At 500 codebase queries per day — modest for any production agent — you're spending $30–$100/day per agent on questions a graph database could answer in 1ms for free. Multiply by your fleet, by your monthly active runs, by the number of times the agent re-asks the same thing because it lost context. The bill compounds. Your margins don't.
The cloud-memory crowd (Mem0, Zep, Letta) trades LLM tokens for API calls + cloud round-trips + data exfil to their servers. Better than naive RAG, still wrong shape for code-aware agents and still rate-limited at scale.
Per-query cost on Claude Opus 4.7 ($15/M input, $75/M output) for a typical autonomous-agent workload:
| Operation | Naive RAG | ArgosBrain | Saved per call |
|---|---|---|---|
"Where is validateUser defined?" | 8,000 tk | ~120 tk | $0.024 |
"What calls auth.middleware?" | 15,000 tk | ~180 tk | $0.045 |
| "Audit codebase for SSRF surface" | 200,000 tk | ~400 tk | $0.60 |
"Module boundary of OrderService" | 25,000 tk | ~300 tk | $0.074 |
| "Did this PR add new attack surface?" | 80,000 tk | ~250 tk | $0.24 |
An autonomous agent doing 500 such queries per day on a codebase: ~$45/day → ~$1,350/month saved per agent. The dashboard ships a live token-saved counter so you watch the savings accumulate in real time, not on a vendor's invoice 30 days late.
ArgosBrain ships an MCP server. Any agent framework that speaks MCP — and in 2026 that's most of them — picks it up automatically.
Python — Anthropic SDK + MCP
from mcp import ClientSession
from anthropic import Anthropic
session = ClientSession.from_command("argosbrain-mcp")
await session.initialize()
# Pass argosbrain tools straight to Claude — it auto-discovers them
client = Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
tools=await session.list_tools(), # 35+ tools instantly
messages=[{"role": "user", "content": "audit security on this codebase"}],
)
TypeScript — OpenAI Agent SDK + MCP
import { McpClient } from "@modelcontextprotocol/sdk";
import OpenAI from "openai";
const argos = await McpClient.connect("argosbrain-mcp");
const tools = await argos.listTools();
const openai = new OpenAI();
await openai.chat.completions.create({
model: "gpt-5",
tools, // model auto-routes calls to argos
messages: [{ role: "user", content: "find dead code" }],
});
Rust — custom agent
let argos = MCPClient::stdio("argosbrain-mcp").await?;
let result = argos.call_tool("find_sinks",
json!({"kind": "ssrf", "reachable_only": true})).await?;
LangChain — via langchain-mcp-adapters
from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.prebuilt import create_react_agent
async with MultiServerMCPClient({
"argos": {"command": "argosbrain-mcp", "transport": "stdio"}
}) as client:
tools = client.get_tools() # 35+ argos tools
agent = create_react_agent(llm, tools) # standard LangGraph agent
response = await agent.ainvoke(
{"messages": [("user", "audit this codebase for SSRF")]}
)
The langchain-mcp-adapters package (official langchain-ai org) wraps any MCP server as LangChain tools. Argos plugs in like any other tool source — no Argos-specific code in your agent graph.
No vendor lock-in. No proprietary embedding format. The agent's existing system prompt + tool-routing decides when to call Argos based on tool descriptions — you don't need to teach it.
Ingest a project, poll progress chunked-async-style (no MCP transport timeout regardless of repo size), triage sinks by exploitability score, fetch details only on the critical bucket. Copy this, swap the path, run.
import asyncio, json
from mcp.client.session import ClientSession
from mcp.client.stdio import StdioServerParameters, stdio_client
PROJECT = "/Users/you/code/your-repo"
async def main():
params = StdioServerParameters(command="argosbrain-mcp")
async with stdio_client(params) as (r, w), ClientSession(r, w) as session:
await session.initialize()
# 1. Background ingest — returns in <1s with started=true
await session.call_tool("ingest_start", {"path": PROJECT})
# 2. Poll — each ingest_wait blocks max 30s, prints pct
while True:
res = await session.call_tool("ingest_wait", {"timeout_seconds": 30})
st = json.loads(res.content[0].text)
print(f"⏳ {st['pct']}% — {st['files_done']:,}/{st['files_total']:,}")
if st["complete"]:
break
# 3. Summary triage — bounded payload, ~2 KB
res = await session.call_tool("triage_sinks",
{"kind": "ssrf", "top_n_per_bucket": 5})
sum_ = json.loads(res.content[0].text)
print(f"🔴 critical {sum_['critical_reachable']} · "
f"🟠 high {sum_['high']} · 🟡 med {sum_['medium']}")
# 4. Drill into critical only — full RankedSink payload
if sum_["critical_ids"]:
det = await session.call_tool("triage_sinks_details",
{"kind": "ssrf", "ids": sum_["critical_ids"]})
print(det.content[0].text)
asyncio.run(main())
Every tool call returns under 60s — even on a 3M-LOC monorepo like Kubernetes — because the heavy work runs in a tokio task on the Rust side and the agent polls structurally instead of blocking. The MCP transport stays warm for the entire ingest.
triage_sinks actually returns.So the agent (and you) know the data shape before the first call:
{
"kind": "ssrf",
"critical_reachable": 2, // score ≥ 0.7 AND has callers
"high": 8, // score ≥ 0.4
"medium": 23, // score ≥ 0.1
"informational_dead_code": 42, // score < 0.1, typically dead code
"critical_ids": [
"9f23e0bb-...", // pass to triage_sinks_details
"bc01a74d-..."
],
"high_ids": ["..."], // top 5 of each bucket
"total": 75,
"next_step_hint": "Call triage_sinks_details(ids=[9f23..., bc01...]) for paths + signals on the 2 critical finding(s)."
}
Bounded payload (~2 KB) regardless of total hit count — the "Lost in the Middle" failure mode where 200 KB of full ranked findings drowned the agent's reasoning is impossible by construction. The agent chains triage_sinks → renders summary → asks user → drills into specific IDs only on demand.
Code intelligence
symbol_exists · resolve_member · list_symbols · find_dead_symbols · find_module_boundaries · naming_convention · check_name · control_flow_path · check_reachability
Retrieval (sub-millisecond)
search · recall · ask · extract_fact · semantic_hop · introspect
Security & audit
find_sinks · triage_sinks (with exploitability score) · triage_sinks_details · find_sinks_delta (for CI / PR review) · ingest_findings (SARIF import for Semgrep / CodeQL / Snyk)
Ingest (chunked async, no transport timeout)
ingest_start · ingest_status · ingest_wait · ingest_codebase
Memory ops
remember · connect · verify · forget · consolidate · core_memories · temporal_state · discuss · decide · strengthen
Every tool returns structured JSON with explicit confidence grades (CLEAR / AMBIGUOUS / NO_CONFIDENT_MATCH) so your agent doesn't have to guess when to trust top-1 vs. cross-check. Every response carries the rationale — your agent can show its work to the user.
Rate limits don't apply. Autonomous agents loop. A reviewer agent might call find_sinks 40 times in a row across categories during a single PR review. Cloud APIs throttle you at 60 req/min and bill per call — Argos answers in <5ms in-process with no quota.
Determinism. Same query → same response, byte-identical. Cloud RAG and LLM-backed memory return different chunks every run because the underlying ranker is stochastic. For agent eval, regression detection, and reproducible CI runs, structural retrieval is the only honest answer.
Privacy by construction. Enterprise legal teams don't approve sending proprietary source to Mem0 / Zep / Letta cloud. Argos is in-process Rust on the developer's machine — the source code never leaves the laptop, the dashboard binds to 127.0.0.1 only, zero outbound traffic. The same property unlocks healthcare / finance / classified workloads off the bat.
Bonus for multi-agent systems: a Researcher agent stores findings via remember, a Coder agent reads them via recall, a Reviewer agent cross-references via find_sinks. One brain, one project scope, three agents amortising the cost. Token savings compound across the team.
| Naive RAG | Mem0 / Zep / Letta | LangChain Memory | ArgosBrain | |
|---|---|---|---|---|
| Cost per retrieval | $$ (LLM) | $ (API) | $$ (LLM) | $0 |
| Latency | 200–2000 ms | 50–200 ms | 300–1500 ms | <5 ms |
| Determinism | ✗ | ⚠ | ✗ | ✓ |
| Code-aware (tree-sitter) | ✗ | ⚠ partial | ✗ | ✓ 8 langs |
| Local-first (no cloud) | ✗ | ✗ | ✗ | ✓ |
| Cross-session memory | ✗ | ✓ | ⚠ | ✓ + zones |
| Persistent across restarts | ✗ | ✓ | ✗ | ✓ |
| Free tier | n/a | 1k msgs | n/a | Unlimited projects · 20k nodes/project |
Per-vendor head-to-heads with the exact tool surface for each: vs Mem0 · vs Letta · vs Zep · vs LangMem · vs other MCP memory servers.
curl install · MCP server details · LongMemCode benchmark (91.6%) · Architecture paper · Independent verdict · Pricing
Free tier covers most solo agent builds. Pro ($19/seat/mo) unlocks unlimited nodes for production agents. Team Defender ($79/seat/mo) ships the GitHub Action wrapper for autonomous PR-review bots in CI.