We publish our work in the open so you can verify the numbers on the site without taking our word for it. Every paper here is CC BY 4.0 and the LongMemCode benchmark is MIT. The engine itself is commercial — but adapter stubs for competitors are in the benchmark repo, so anyone can reproduce the headline numbers. If something is wrong, the data is on GitHub for you to prove it.
The benchmark now includes a scale corpus (Kubernetes v1.32.0, 38 771 project symbols, 1 456 scenarios) and two new reference adapters: sbert-faiss (a dense-semantic baseline, zero LLM) and mem0 @ gpt-4o-mini (Mem0's open-source general-purpose memory system, evaluated on a code-structural benchmark). A general-purpose memory system scoring low here is a measurement of scope match, not a verdict on its core job — Mem0's intended workload is conversational memory. Full framing, per-category breakdown, and every adapter's exact configuration are in the linked report.
Paper 1 specifies the LongMemCode benchmark and reports baselines — it is the empirical anchor the other papers cite. Paper 2 argues the structural-versus-semantic split as a design principle. Paper 3 describes Neurogenesis, the graph-first engine behind ArgosBrain, in sufficient detail to reproduce the retrieval behaviour. Engineering writeups, field reports, and case studies live on the blog — keep this collection scoped to peer-shape research.
We introduce LongMemCode, a public benchmark for evaluating the retrieval component of memory systems used by AI coding agents. Existing benchmarks measure either conversational long-term memory (LongMemEval, LoCoMo) or end-to-end agent task success (SWE-bench); none isolates the retrieval quality, speed, and compression of a memory system at coding-agent workloads.
We propose a taxonomy of retrieval queries for AI coding agents and argue that code-memory systems require separate treatment of two query classes rather than a unified retrieval layer. Structural queries admit exact answers derivable from a semantic graph of canonical identifiers; semantic queries are best served by vector retrieval over embedded code chunks.
We describe Neurogenesis, a graph-first code-memory engine that answers structural retrieval queries for AI coding agents without any LLM call on the read path. The engine ingests source code into a canonical-identifier graph via a tiered pipeline that selects the highest-precision indexing technology available per language.
Cite. All three are pre-print (April 2026). Use the arXiv identifiers once they are assigned; until then, cite the title, the author (Aurelian Jibleanu, Neurogenesis), the year (2026), and the canonical URL on this site. A BibTeX block will be published alongside each paper once the arXiv IDs land.
Clone LongMemCode, plug in your own adapter, and you will reproduce the baseline and grep numbers in Paper 1 on your own laptop. The structural reference adapter points to a running ArgosBrain instance — the binary is ours, the protocol and the benchmark are open.