ArgosBrain · For AI agent builders

Your agent burns tokens
asking the same questions.
Argos answers them for $0.

A local, deterministic, MCP-native memory layer for autonomous agents. Drop-in for LangChain, OpenAI Agent SDK, Claude Agent SDK, Cline, Cursor, custom agents — anything that speaks Model Context Protocol. Sub-millisecond retrieval. ~150× token reduction on codebase queries. Free tier: unlimited projects, 20k nodes per project.

01The problem

Autonomous agents bleed tokens on what should be free.

Build an agent that operates on a real codebase and you'll watch it burn 5,000–50,000 tokens for each "find where X is defined" or "what calls Y." Not because the question is hard — because the standard answer is grep + read + LLM-summarize, and your agent does it on every turn.

At 500 codebase queries per day — modest for any production agent — you're spending $30–$100/day per agent on questions a graph database could answer in 1ms for free. Multiply by your fleet, by your monthly active runs, by the number of times the agent re-asks the same thing because it lost context. The bill compounds. Your margins don't.

The cloud-memory crowd (Mem0, Zep, Letta) trades LLM tokens for API calls + cloud round-trips + data exfil to their servers. Better than naive RAG, still wrong shape for code-aware agents and still rate-limited at scale.

02The economics

Concrete math, not marketing.

Per-query cost on Claude Opus 4.7 ($15/M input, $75/M output) for a typical autonomous-agent workload:

Operation Naive RAG ArgosBrain Saved per call
"Where is validateUser defined?"8,000 tk~120 tk$0.024
"What calls auth.middleware?"15,000 tk~180 tk$0.045
"Audit codebase for SSRF surface"200,000 tk~400 tk$0.60
"Module boundary of OrderService"25,000 tk~300 tk$0.074
"Did this PR add new attack surface?"80,000 tk~250 tk$0.24

An autonomous agent doing 500 such queries per day on a codebase: ~$45/day → ~$1,350/month saved per agent. The dashboard ships a live token-saved counter so you watch the savings accumulate in real time, not on a vendor's invoice 30 days late.

03Drop-in

Five lines. Three ecosystems. No bespoke protocol.

ArgosBrain ships an MCP server. Any agent framework that speaks MCP — and in 2026 that's most of them — picks it up automatically.

Python — Anthropic SDK + MCP

from mcp import ClientSession
from anthropic import Anthropic

session = ClientSession.from_command("argosbrain-mcp")
await session.initialize()

# Pass argosbrain tools straight to Claude — it auto-discovers them
client = Anthropic()
response = client.messages.create(
    model="claude-opus-4-7",
    tools=await session.list_tools(),  # 35+ tools instantly
    messages=[{"role": "user", "content": "audit security on this codebase"}],
)

TypeScript — OpenAI Agent SDK + MCP

import { McpClient } from "@modelcontextprotocol/sdk";
import OpenAI from "openai";

const argos = await McpClient.connect("argosbrain-mcp");
const tools = await argos.listTools();

const openai = new OpenAI();
await openai.chat.completions.create({
  model: "gpt-5",
  tools,                       // model auto-routes calls to argos
  messages: [{ role: "user", content: "find dead code" }],
});

Rust — custom agent

let argos = MCPClient::stdio("argosbrain-mcp").await?;
let result = argos.call_tool("find_sinks",
    json!({"kind": "ssrf", "reachable_only": true})).await?;

LangChain — via langchain-mcp-adapters

from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.prebuilt import create_react_agent

async with MultiServerMCPClient({
    "argos": {"command": "argosbrain-mcp", "transport": "stdio"}
}) as client:
    tools = client.get_tools()                  # 35+ argos tools
    agent = create_react_agent(llm, tools)      # standard LangGraph agent
    response = await agent.ainvoke(
        {"messages": [("user", "audit this codebase for SSRF")]}
    )

The langchain-mcp-adapters package (official langchain-ai org) wraps any MCP server as LangChain tools. Argos plugs in like any other tool source — no Argos-specific code in your agent graph.

No vendor lock-in. No proprietary embedding format. The agent's existing system prompt + tool-routing decides when to call Argos based on tool descriptions — you don't need to teach it.

Complete agent loop — runnable in 30 lines.

Ingest a project, poll progress chunked-async-style (no MCP transport timeout regardless of repo size), triage sinks by exploitability score, fetch details only on the critical bucket. Copy this, swap the path, run.

import asyncio, json
from mcp.client.session import ClientSession
from mcp.client.stdio import StdioServerParameters, stdio_client

PROJECT = "/Users/you/code/your-repo"

async def main():
    params = StdioServerParameters(command="argosbrain-mcp")
    async with stdio_client(params) as (r, w), ClientSession(r, w) as session:
        await session.initialize()

        # 1. Background ingest — returns in <1s with started=true
        await session.call_tool("ingest_start", {"path": PROJECT})

        # 2. Poll — each ingest_wait blocks max 30s, prints pct
        while True:
            res = await session.call_tool("ingest_wait", {"timeout_seconds": 30})
            st  = json.loads(res.content[0].text)
            print(f"⏳ {st['pct']}% — {st['files_done']:,}/{st['files_total']:,}")
            if st["complete"]:
                break

        # 3. Summary triage — bounded payload, ~2 KB
        res = await session.call_tool("triage_sinks",
            {"kind": "ssrf", "top_n_per_bucket": 5})
        sum_ = json.loads(res.content[0].text)
        print(f"🔴 critical {sum_['critical_reachable']} · "
              f"🟠 high {sum_['high']} · 🟡 med {sum_['medium']}")

        # 4. Drill into critical only — full RankedSink payload
        if sum_["critical_ids"]:
            det = await session.call_tool("triage_sinks_details",
                {"kind": "ssrf", "ids": sum_["critical_ids"]})
            print(det.content[0].text)

asyncio.run(main())

Every tool call returns under 60s — even on a 3M-LOC monorepo like Kubernetes — because the heavy work runs in a tokio task on the Rust side and the agent polls structurally instead of blocking. The MCP transport stays warm for the entire ingest.

What triage_sinks actually returns.

So the agent (and you) know the data shape before the first call:

{
  "kind": "ssrf",
  "critical_reachable": 2,         // score ≥ 0.7 AND has callers
  "high": 8,                       // score ≥ 0.4
  "medium": 23,                    // score ≥ 0.1
  "informational_dead_code": 42,   // score < 0.1, typically dead code
  "critical_ids": [
    "9f23e0bb-...",                // pass to triage_sinks_details
    "bc01a74d-..."
  ],
  "high_ids": ["..."],             // top 5 of each bucket
  "total": 75,
  "next_step_hint": "Call triage_sinks_details(ids=[9f23..., bc01...]) for paths + signals on the 2 critical finding(s)."
}

Bounded payload (~2 KB) regardless of total hit count — the "Lost in the Middle" failure mode where 200 KB of full ranked findings drowned the agent's reasoning is impossible by construction. The agent chains triage_sinks → renders summary → asks user → drills into specific IDs only on demand.

04What your agent gets

35+ MCP tools, organised by category.

Code intelligence

symbol_exists · resolve_member · list_symbols · find_dead_symbols · find_module_boundaries · naming_convention · check_name · control_flow_path · check_reachability

Retrieval (sub-millisecond)

search · recall · ask · extract_fact · semantic_hop · introspect

Security & audit

find_sinks · triage_sinks (with exploitability score) · triage_sinks_details · find_sinks_delta (for CI / PR review) · ingest_findings (SARIF import for Semgrep / CodeQL / Snyk)

Ingest (chunked async, no transport timeout)

ingest_start · ingest_status · ingest_wait · ingest_codebase

Memory ops

remember · connect · verify · forget · consolidate · core_memories · temporal_state · discuss · decide · strengthen

Every tool returns structured JSON with explicit confidence grades (CLEAR / AMBIGUOUS / NO_CONFIDENT_MATCH) so your agent doesn't have to guess when to trust top-1 vs. cross-check. Every response carries the rationale — your agent can show its work to the user.

05Why local-first matters for agents specifically

Three properties no cloud memory layer can match.

Rate limits don't apply. Autonomous agents loop. A reviewer agent might call find_sinks 40 times in a row across categories during a single PR review. Cloud APIs throttle you at 60 req/min and bill per call — Argos answers in <5ms in-process with no quota.

Determinism. Same query → same response, byte-identical. Cloud RAG and LLM-backed memory return different chunks every run because the underlying ranker is stochastic. For agent eval, regression detection, and reproducible CI runs, structural retrieval is the only honest answer.

Privacy by construction. Enterprise legal teams don't approve sending proprietary source to Mem0 / Zep / Letta cloud. Argos is in-process Rust on the developer's machine — the source code never leaves the laptop, the dashboard binds to 127.0.0.1 only, zero outbound traffic. The same property unlocks healthcare / finance / classified workloads off the bat.

Bonus for multi-agent systems: a Researcher agent stores findings via remember, a Coder agent reads them via recall, a Reviewer agent cross-references via find_sinks. One brain, one project scope, three agents amortising the cost. Token savings compound across the team.

06Vs other memory layers

Honest comparison, picked to be unflattering to us where it should be.

Naive RAG Mem0 / Zep / Letta LangChain Memory ArgosBrain
Cost per retrieval$$ (LLM)$ (API)$$ (LLM)$0
Latency200–2000 ms50–200 ms300–1500 ms<5 ms
Determinism
Code-aware (tree-sitter)⚠ partial✓ 8 langs
Local-first (no cloud)
Cross-session memory✓ + zones
Persistent across restarts
Free tiern/a1k msgsn/aUnlimited projects · 20k nodes/project

Per-vendor head-to-heads with the exact tool surface for each: vs Mem0 · vs Letta · vs Zep · vs LangMem · vs other MCP memory servers.

07Next

Install in one line. Drop-in five lines.

curl install · MCP server details · LongMemCode benchmark (91.6%) · Architecture paper · Independent verdict · Pricing

Free tier covers most solo agent builds. Pro ($19/seat/mo) unlocks unlimited nodes for production agents. Team Defender ($79/seat/mo) ships the GitHub Action wrapper for autonomous PR-review bots in CI.