Abstract
We describe Neurogenesis, a graph-first code-memory engine that answers structural retrieval queries for AI coding agents without any LLM call on the read path. The engine ingests source code into a canonical-identifier graph via a tiered pipeline that selects the highest-precision indexing technology available per language — compiler-grade SCIP indexers where mature, live language-server workspaces where not, and bespoke tree-sitter semantic walkers for the long-tail remainder. The retrieval API exposes structural primitives — symbol existence, member resolution, containment enumeration, call-graph traversal, override resolution — directly as deterministic graph operations. File-hash content detection invalidates stale subgraphs on source-tree changes, making re-ingest cost linear in the number of changed files rather than the repository size. Ingest operates in isolated subprocesses with bounded lifetimes; a crashing language server cannot affect the retrieval hot path, which is in-process Rust reading from local bincode-serialised graph storage. We report P99 retrieval latency at or below a single millisecond across 16 benchmark corpora, memory footprint in low hundreds of megabytes for repositories of several hundred thousand symbols, and zero monetary cost per thousand retrieval queries. We discuss the design-space alternatives rejected and limitations that remain.
Introduction
AI coding agents that persist knowledge between sessions need a memory layer whose cost, latency, and accuracy match the expectations of interactive developer work. Three cost dimensions matter: dollars per query (charged by embedding or LLM API calls on the retrieval path), milliseconds per query (P99 matters more than P50 for interactive use), and staleness after source-tree changes (a refactor that renames several hundred symbols should not require re-ingesting the entire repository). Existing general-purpose memory systems for agents typically optimise one dimension at the expense of the others: dollar-cheap retrieval at the cost of LLM calls on writes; fast retrieval at the cost of accuracy on structural code queries; accurate retrieval at the cost of expensive re-ingestion.
This paper describes Neurogenesis, a memory engine specifically designed for the code-memory workload identified in companion work [Jibleanu, 2026a; Jibleanu, 2026b]. Neurogenesis optimises for the structural-query-dominated distribution coding agents actually issue, and accepts the corresponding design constraints: a graph-first storage layer, compiler-grade ingest where possible, a zero-LLM hot path, and content-hash-based incremental updates. The engine serves as the reference adapter in the LongMemCode benchmark [Jibleanu, 2026a] and is the subject of the measurements reported there.
The contributions of this paper are: (1) a high-level architecture description of a tiered, graph-first, in-process code-memory engine; (2) a justification of each design choice against alternatives, grounded in the structural-versus-semantic taxonomy [Jibleanu, 2026b]; (3) measured operational properties — latency, footprint, re-ingest cost — for the engine running against real open-source corpora; and (4) an explicit discussion of design-space limits and open problems.
Related Work
4.1 Knowledge-graph memory for agents
Graphiti [Rasmy et al., 2025] and MemGPT / Letta [Packer et al., 2023] are the dominant graph-based agent-memory systems in production use. Both treat memory as a temporal knowledge graph of entities and labelled relations, extracted via LLM from conversational or documentary input. Graphiti requires an external graph database (Neo4j, FalkorDB, Kuzu, or Neptune); Letta maintains a tiered core/archival/recall structure edited by agent self-calls. Both pay LLM cost on write and, in Letta’s case, on read as well. Neither ingests source code as canonical-identifier graphs, and neither exposes structural-code-query primitives.
4.2 Retrieval-augmented code completion
Continue’s @codebase [Continue, 2025] parses source with
tree-sitter, embeds top-level function and class bodies, and retrieves
top-k chunks on demand. The chunks are text; the retrieval is
semantic. Aider’s repository map [Aider, 2023] extracts tree-sitter
symbols and ranks files by PageRank over reference edges, injecting the
top-ranked identifiers into every prompt. Neither system builds a
traversable graph of canonical identifiers, and neither supports queries
such as “who overrides method m” or “which callers of function
f” without fallback to text search.
4.3 Industrial code indexers
SCIP [Sourcegraph, 2023] is an open-source protocol for representing
source-code indexing data. SCIP indexers exist for Rust (via
rust-analyzer), Python (via a patched
pyright), Go (via scip-go), TypeScript /
JavaScript (scip-typescript), Java and Scala (via
semanticdb and scip-java), PHP
(scip-php), Ruby (scip-ruby), C#
(scip-dotnet), and Dart (scip_dart).
Sourcegraph uses SCIP to power cross-repository code search across
billions of lines of code. SCIP is an ingestion format; it is not a
memory engine, nor does it expose retrieval APIs designed for agent
consumption. Neurogenesis consumes SCIP as one of its ingest backends,
alongside others.
4.4 Language-server protocol indices
The Language Server Protocol [Microsoft, 2016] provides
textDocument/documentSymbol and
workspace/symbol as primitives that can be used to
enumerate symbols in a workspace. Some language ecosystems (Kotlin,
Swift) have mature language servers but no production-ready SCIP
indexer. We use live LSP ingest opportunistically in those cases.
4.5 Tree-sitter-based semantic extraction
Tree-sitter [Brunsfeld, 2018] is an incremental parser-generator framework with grammars for over 100 languages. It produces concrete syntax trees; it does not perform cross-file symbol resolution, type inference, or import resolution. Using tree-sitter for semantic extraction requires per-language walker logic that maps CST nodes to canonical identifiers — a substantial engineering effort per language but the only option for languages without mature SCIP or LSP support.
Design Goals
Neurogenesis is designed against four explicit goals.
G1. Structural correctness at compiler-grade depth. For every language in the target set, structural queries — does this symbol exist, list methods of a class, enumerate overrides — must return exact, reproducible answers. This rules out approximate retrieval on the structural path.
G2. Sub-millisecond P99 retrieval at laptop resource budget. Interactive coding UX lives at the tail. A memory layer that serves an agent mid-task cannot pause the user. Retrieval must be graph-local and in-process; retrieval cannot call out to external services or spawn subprocesses per query.
G3. Zero monetary cost on the retrieval path, forever. The read path must never call an LLM, never call an embedding API, never make a network request. This constrains the storage model (all structure must be pre-computed at ingest time) but removes an entire class of operational failure modes.
G4. Re-ingest cost linear in the diff, not in the repository. Developer workflows issue branch switches, rebases, and partial edits constantly. A memory engine whose ingest cost is proportional to the repository size creates back-pressure on normal git operation. Re-ingest must be O(changed files).
These four goals constrain the design space severely. Most commercial agent-memory products satisfy two or three; we argue Neurogenesis is among the first to satisfy all four on the code-memory workload, at the cost of narrowing the target domain from general memory to code specifically.
Architecture
6.1 Components
Neurogenesis consists of three components, connected in a pipeline:
- Ingest pipeline: consumes a source-tree commit SHA and produces a canonical-identifier graph persisted to local on-disk storage.
- Graph store: on-disk bincode-serialised graph, with an in-memory working set for query serving.
- Retrieval API: exposes the structural query primitives over a stable protocol (MCP stdio for the production deployment, but the API surface is transport-independent).
A persistent file-watcher component is optional and handles the O(changed files) incremental update path.
Figure 1 in this paper shows these components as a block diagram at a level of detail that illustrates the architecture without revealing internal types.
6.2 Tiered ingest pipeline
The ingest pipeline selects one of three backend strategies per language, chosen to maximise structural precision given the tooling available for that language.
Tier 1 — Compiler-grade SCIP indexing. For languages with a mature SCIP indexer, ingest drives the indexer against the source tree. The indexer runs the language’s compiler frontend and produces a SCIP index containing canonical symbol IDs, cross-file references, containment relations, and type information. Neurogenesis parses the SCIP index and inserts its nodes and edges directly into the graph. The indexer subprocess terminates at the end of ingest; no long-lived process is required.
Tier 2 — Live language-server ingest. For languages
with a mature language server but no production SCIP indexer, ingest
drives the language server over the LSP protocol. The workspace is
opened, documentSymbol and workspace/symbol
queries enumerate the symbols, and per-language post-processing maps LSP
symbol kinds back to our canonical schema. Ingest is guarded by per-file
and per-session timeouts, and the language server runs as an isolated
subprocess whose lifetime is bounded by the ingest run.
Tier 3 — Bespoke tree-sitter semantic walkers. For languages without either a SCIP indexer or a mature language server, ingest uses tree-sitter grammars augmented with per-language semantic hooks that extract canonical identifiers from the concrete syntax tree. These walkers encode language-specific structural patterns — for example, languages where functions are defined by assignment rather than declaration require recognising assignment-to-function as a function-declaration event; statement-based grammars require per-statement parsing with context preservation; languages with block-label semantics require label-aware walkers. The walkers do not perform cross-file type inference, so their output is structurally shallower than Tier 1 but considerably richer than a generic tree-sitter surface extraction.
The tier is selected at build time based on the source tree’s detected languages. A single ingest run may use all three tiers in parallel across different file subsets of the same repository.
6.3 Graph storage
The graph is a set of nodes representing canonical identifiers and a set of labelled edges representing structural relations between them. Edges carry a label from a fixed schema — containment, reference, inheritance, override, and similar — derived from the tier’s source index.
The graph is persisted on disk in a compact binary serialisation. The hot working set is mapped into memory at retrieval-server startup; cold portions are spilled to disk with an LRU-like policy. Retrieval does not allocate on the typical path: a query walks pre-materialised edges in memory and returns a set of canonical-identifier strings.
6.4 Retrieval API
The retrieval API exposes structural primitives as named operations over the graph. The operation surface in the production deployment includes symbol existence checks, member resolution, containment enumeration, caller enumeration, override enumeration, and a small number of convenience operations for common agent workflows. Each operation translates to a deterministic graph query with predictable latency profile.
The API is transport-independent: the same operations are exposed over MCP stdio for IDE integration and over an in-process Rust interface for embedded use.
6.5 Staleness and incremental update
On ingest, every file carries a content hash (a collision-resistant hash of the file bytes) stored alongside its canonical-identifier nodes. On re-ingest, each source file’s current hash is compared against the stored hash; files whose hashes match are skipped entirely without parsing. Files whose hashes differ have their existing subgraph removed and rebuilt. The cost of re-ingest is therefore proportional to the number of changed files, not the repository size.
An optional file-watcher component observes the source tree between ingest runs and updates the graph incrementally on save events. The watcher is guarded by directory skip-lists (excluding build output and dependency folders), debouncing (to fold rapid sequences of save events from editors using atomic-save patterns), and per-subtree rate limits (to prevent runaway processes from wedging the host). Watcher operation is opt-in; the pull-based ingest path remains the correctness path.
6.6 Subprocess isolation and zero-panic guarantees
All external processes — SCIP indexers, language servers, tree-sitter
walker invocations — run as operating-system subprocesses with explicit
lifetime bounds. When the ingest run ends, subprocesses are killed. A
subprocess crash surfaces as a Result::Err in the Rust
parent; it cannot propagate as a panic into the retrieval path.
The retrieval hot path — the MCP stdio loop that serves the agent —
is written without unwrap() in library code. Every fallible
operation returns Result. The retrieval path never spawns
subprocesses, never performs I/O beyond reading from the local graph
store, and holds no locks that an ingest path holds. Ingest and
retrieval are independent execution domains that share the graph through
a controlled write-snapshot protocol.
6.7 Block diagram
Figure 1 — Component block diagram.
A simple block diagram. Three rows. Top row: source tree on the left, tiered ingest pipeline (three boxes labelled Tier 1 SCIP, Tier 2 LSP, Tier 3 tree-sitter) in the middle, arrow to the right. Middle row: graph store as a single cylinder, in-memory working set above it. Bottom row: retrieval API as a box at right, MCP stdio as the transport on the far right, agent symbol on the right edge. No internal types, no parameters, no specific languages labelled against tiers. What this figure shows: how the three components connect. What it deliberately does not show: internal storage layout, specific parameter values, per-language tier assignments, or any detail that would enable implementation replication.
Engineering Properties
7.1 Measured latency
Retrieval P99 latency across the 16 corpora of LongMemCode is at or below 0.82 milliseconds in the worst case, and below 0.1 milliseconds for the majority of categories. Latency is dominated by the cost of edge traversal plus result serialisation; there is no component of the retrieval path that scales with repository size given a bounded result set. Figure 2 shows the full cumulative distribution function of per-query latency across the benchmark.
Figure 2 — Per-query latency CDF across LongMemCode.
A cumulative distribution function chart. X-axis: per-query latency in milliseconds, log scale. Y-axis: fraction of queries at or below that latency, from 0 to 1. A single curve representing the flat union of per-query timings across 16 corpora and all nine categories. Source data: LongMemCode run JSONL files at the submission commit. What this figure shows: the latency distribution has no long tail — the curve reaches the top within two orders of magnitude of the median. What it deliberately does not show: per-corpus breakdown, or any architectural attribution for why the tail is short.
7.2 Measured memory footprint
Memory footprint, measured as resident-set size during steady-state query serving, is in the low hundreds of megabytes for repositories of several hundred thousand symbols. Footprint scales approximately linearly with the number of stored nodes and edges, with a constant factor set by the serialisation format and the in-memory index structures.
Limits at extreme scale. The measurements in this paper cover repositories up to the scale of the largest corpora in LongMemCode (several hundred thousand symbols). We have not benchmarked repositories in the Linux-kernel or Chromium class (on the order of several million symbols). At that scale an all-in-memory graph would cross the tens-of-gigabytes threshold and become impractical on laptop-class hardware. The architecture anticipates this by leaving room for a tiered-storage layer: hot subgraphs remain in process memory, cold subgraphs spill to a local key-value store (SQLite, RocksDB, or LMDB are the obvious candidates). The retrieval API does not change — a cold-tier fetch becomes a hidden I/O inside a traversal step, with a latency tax that can be measured and reported per query class. We flag the tiered-storage extension here as a deliberate scope boundary rather than an oversight; every latency and footprint claim in the present paper is bounded to the measured scale.
7.3 Measured cost
Retrieval has no monetary cost per query. There is no LLM call, no embedding call, no external API call on the read path. The ingest cost is one-time per changed file: running the tier’s backend on the file, parsing its output, and inserting into the graph. Compilation or tree-sitter parsing cost is the dominant term.
Figure 3 — Cost per thousand retrieval queries, comparative.
A horizontal bar chart. Y-axis: systems (Neurogenesis / structural reference, plus placeholder bars for any other adapter present in LongMemCode at submission time). X-axis: cost in US dollars per 1 000 retrieval queries, log scale. Source data: Neurogenesis at $0 (measured, no LLM on read path); other systems inferred from their publicly documented pricing and the prompt tokens they inject per query (exact method described in the caption). What this figure shows: the architectural choice of zero-LLM retrieval produces an order-of-magnitude cost gap versus any system that injects retrieved content into an LLM prompt. What it deliberately does not show: internal explanation of how zero-LLM retrieval is achieved — that is the architecture itself.
7.4 Re-ingest cost
Re-ingest on a zero-diff source tree (no file content changes) completes in under five seconds for a large repository. Re-ingest after a three-hundred-file diff completes in a few seconds for compiler-grade-ingested languages and sub-second for tree-sitter-ingested languages. The cost is linear in the number of changed files.
7.5 Zero-panic property
The retrieval hot path has no unwrap() in library code;
every fallible operation is threaded through Result. Ingest
subprocesses cannot propagate panics into the retrieval path because
they are separated by operating-system process boundaries. A malformed
input file fails ingest for that file, logs a warning, and does not
block the ingest run from completing or the retrieval server from
serving previously-ingested queries.
Design-Space Alternatives
8.1 Vector-only storage
A vector-only memory engine would embed each code chunk and retrieve via similarity. This is the default paradigm in the LLM application layer. We reject it for Neurogenesis because the structural-query distribution [Jibleanu, 2026b] penalises it: vector retrieval cannot natively return empty sets for hallucinated identifiers, cannot enumerate overrides, cannot follow inheritance edges. A vector component is complementary to the graph — Neurogenesis can coexist with one — but it cannot replace the graph for structural workloads.
8.2 External graph database
An external graph database (Neo4j, FalkorDB, Kuzu, Neptune) is the path taken by Graphiti and Zep. We reject it because it violates Goal G2 (sub-millisecond P99 at laptop resource budget): network round-trip costs dominate graph-local traversal costs. In-process Rust storage gives us deterministic latency; database-backed storage does not.
8.3 LLM-in-the-loop on the retrieval path
Letta’s read path calls an LLM tool. MemGPT’s read path calls an LLM tool. We reject the pattern because it violates Goal G3 (zero monetary cost per query). A memory engine that charges per read scales its cost with agent usage; a memory engine that pre-computes its structure at ingest and serves from that structure does not.
8.4 Incremental indexer running against the source
Some industrial code-intelligence products operate an incremental indexer that continuously maintains an up-to-date index against the source tree. We reject the continuous path in favour of a content-hash pull model for two reasons: it simplifies operation (there is no daemon to monitor), and it makes the cost of re-ingest explicitly attributable rather than amortised into background CPU use.
8.5 Full tree-sitter everywhere
A simpler design would use tree-sitter for every language rather than a tiered pipeline. We reject it because tree-sitter produces surface syntax and does not perform cross-file symbol resolution. The head of the language distribution — where most real code is written — has compiler-grade indexing available, and using it produces substantially richer graphs. The tiered approach pays extra engineering cost upfront to hit the richer indexing when available.
Limitations
9.1 Tier coverage is uneven
Tier 1 (compiler-grade SCIP) covers a subset of the languages we target. Tier 2 (live LSP) covers languages with mature language servers but no SCIP indexer. Tier 3 (tree-sitter walkers) covers the remainder. The structural richness of the resulting graph is correspondingly uneven: a refactor-audit query on a Tier 1 language is backed by cross-file type resolution; the same query on a Tier 3 language is backed by syntactic inference with known gaps. Users working primarily in Tier 3 languages will see a larger residual gap between Neurogenesis and a hypothetical perfect indexer than users working in Tier 1 languages.
9.2 Ingest is not instant
The content-hash skip makes re-ingest O(changed files), but the first-time ingest of a repository pays the full cost of running every file’s tier backend. On large repositories, first-time ingest can take minutes. We consider this acceptable — it amortises across sessions — but we name it explicitly.
9.3 Tier 2 inherits language-server variance
Live LSP ingest is gated by language-server quality. Language servers are notorious for memory leaks, crashes, and workspace-load latency variance. The subprocess-isolation and timeout model [Section 4.6] bounds the blast radius, but does not eliminate it: an ingest run against an uncooperative language server takes longer or fails on that language specifically, without affecting retrieval availability for already-ingested data.
9.4 Semantic queries require additional infrastructure
As argued in companion work [Jibleanu, 2026b], structural and semantic queries are distinct. Neurogenesis in its currently described form handles structural queries. A complete production memory layer for coding agents benefits from a companion semantic-retrieval component, which can share the same ingest pass but uses an embedding index alongside the graph. We do not describe such a companion in this paper.
9.5 Team sync is unimplemented
Neurogenesis is local-first by design: ingest, storage, and retrieval all happen in-process. Multi-user team memory with shared indices and synchronisation across user accounts is not implemented. Enterprises with these requirements today should use conversational-memory products with team-sync support; we discuss this gap as future work.
Related Work Revisited: Comparison Table
We conclude by situating Neurogenesis against adjacent systems along the four design goals. Table 1 summarises the comparison.
| System | G1 Structural correctness | G2 Sub-ms P99 | G3 Zero-cost read | G4 O(diff) re-ingest |
|---|---|---|---|---|
| Neurogenesis | Yes (tiered ingest) | Yes | Yes | Yes (content-hash) |
| Graphiti / Zep | Partial (entity extraction via LLM) | No (external DB) | Yes (free traversal) | No |
| Mem0 | No (semantic only) | Yes | Partial | No |
| Letta | No (agent-driven text memory) | No (LLM in read) | No (LLM in read) | No |
| Cursor Memories | No (prompt injection) | N/A (not a retrieval query) | No (prompt tokens) | No |
Continue @codebase |
Partial (tree-sitter chunks) | No (embedding lookup) | No (prompt tokens) | No |
| Aider repo map | Partial (surface names + PageRank) | N/A (re-computed per request) | No (1 000 tokens / req) | N/A (stateless) |
Conclusion and Future Work
We have described Neurogenesis, a graph-first code-memory engine designed for AI coding agents on inner-loop workloads. The engine satisfies four design goals simultaneously — structural correctness at compiler-grade depth, sub-millisecond P99 retrieval, zero monetary cost per read, and O(changed files) re-ingest — by choosing a tiered ingest pipeline, in-process Rust graph storage, and a retrieval API that exposes structural graph primitives directly. Measured latency on the LongMemCode benchmark is sub-millisecond P99 across all corpora tested; memory footprint is bounded by the graph size and fits in laptop resource budgets for realistic repositories.
Future work falls into three branches. First, expanding tier coverage, particularly Tier 1 (SCIP) support for additional languages as upstream indexers mature. Second, companion semantic-retrieval infrastructure that shares ingest with the graph and addresses the non-structural portion of the coding-agent query distribution. Third, team-sync and multi-user deployment patterns that preserve the local-first operational model while allowing shared indices for collaborative workflows.
References
@inproceedings{aider2023repomap,
title={Repository Map: Scaling to Large Codebases with Tree-sitter and PageRank},
author={Aider Team},
year={2023},
url={https://aider.chat/2023/10/22/repomap.html}
}
@inproceedings{brunsfeld2018treesitter,
title={Tree-sitter: An Incremental Parsing System for Programming Tools},
author={Brunsfeld, Max},
year={2018},
url={https://tree-sitter.github.io/tree-sitter/}
}
@inproceedings{chhikara2024mem0,
title={Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory},
author={Chhikara, Prateek and others},
booktitle={arXiv preprint arXiv:2504.19413},
year={2025}
}
@inproceedings{continue2025codebase,
title={@codebase Retrieval Architecture},
author={Continue Dev Team},
year={2025},
url={https://docs.continue.dev/customize/deep-dives/codebase}
}
@misc{jibleanu2026longmemcode,
title={LongMemCode: A Deterministic Benchmark for Code-Memory in AI Agents},
author={Jibleanu, Aurelian},
year={2026},
note={Companion paper and MIT-licensed benchmark repository}
}
@misc{jibleanu2026taxonomy,
title={Structural vs Semantic Retrieval in Code-Memory: A Query-Type Taxonomy},
author={Jibleanu, Aurelian},
year={2026},
note={Companion paper}
}
@inproceedings{microsoft2016lsp,
title={Language Server Protocol Specification},
author={Microsoft},
year={2016},
url={https://microsoft.github.io/language-server-protocol/}
}
@inproceedings{packer2023memgpt,
title={MemGPT: Towards LLMs as Operating Systems},
author={Packer, Charles and others},
booktitle={arXiv preprint arXiv:2310.08560},
year={2023}
}
@inproceedings{rasmy2025zep,
title={Zep: A Temporal Knowledge Graph Architecture for Agent Memory},
author={Rasmy, Preston and others},
booktitle={arXiv preprint arXiv:2501.13956},
year={2025}
}
@inproceedings{sourcegraph2023scip,
title={SCIP: The Source Code Intelligence Protocol},
author={Sourcegraph},
year={2023},
url={https://github.com/sourcegraph/scip}
}
Appendices
13.1 Appendix A — Protocol: ingest backend abstraction
Pseudocode interface for the abstract ingest backend (not the Rust trait definition — a simplified pseudocode that conveys the shape without disclosing the trait’s internals). One page. Signatures only. No implementation bodies.
13.2 Appendix B — Protocol: retrieval API surface
The MCP-exposed retrieval operations, with expected input and output shapes. This is already public in the MCP schema we ship, so it is safe to reproduce here. One page.