Papers

Three papers. Two benchmarks. One architecture.

We publish our work in the open so you can verify the numbers on the site without taking our word for it. Every paper here is CC BY 4.0 and the LongMemCode benchmark is MIT. The engine itself is commercial — but adapter stubs for competitors are in the benchmark repo, so anyone can reproduce the headline numbers. If something is wrong, the data is on GitHub for you to prove it.

New · 2026-04-22

LongMemCode kubernetes-2k — Kubernetes scale corpus, four adapters compared

The benchmark now includes a scale corpus (Kubernetes v1.32.0, 38 771 project symbols, 1 456 scenarios) and two new reference adapters: sbert-faiss (a dense-semantic baseline, zero LLM) and mem0 @ gpt-4o-mini (Mem0's open-source general-purpose memory system, evaluated on a code-structural benchmark). A general-purpose memory system scoring low here is a measurement of scope match, not a verdict on its core job — Mem0's intended workload is conversational memory. Full framing, per-category breakdown, and every adapter's exact configuration are in the linked report.

See headline table → · Full report on GitHub →

01The collection

Read the foundation first.

Paper 1 specifies the LongMemCode benchmark and reports baselines — it is the empirical anchor the other papers cite. Paper 2 argues the structural-versus-semantic split as a design principle. Paper 3 describes Neurogenesis, the graph-first engine behind ArgosBrain, in sufficient detail to reproduce the retrieval behaviour. Engineering writeups, field reports, and case studies live on the blog — keep this collection scoped to peer-shape research.

Paper 1 · cs.SE / cs.AI

LongMemCode: A Deterministic Benchmark for Code-Memory in AI Agents

Aurelian Jibleanu · Neurogenesis · April 2026

We introduce LongMemCode, a public benchmark for evaluating the retrieval component of memory systems used by AI coding agents. Existing benchmarks measure either conversational long-term memory (LongMemEval, LoCoMo) or end-to-end agent task success (SWE-bench); none isolates the retrieval quality, speed, and compression of a memory system at coding-agent workloads.

Read the paper →

Paper 2 · cs.SE / cs.IR

Structural vs Semantic Retrieval in Code-Memory: A Query-Type Taxonomy

Aurelian Jibleanu · Neurogenesis · April 2026

We propose a taxonomy of retrieval queries for AI coding agents and argue that code-memory systems require separate treatment of two query classes rather than a unified retrieval layer. Structural queries admit exact answers derivable from a semantic graph of canonical identifiers; semantic queries are best served by vector retrieval over embedded code chunks.

Read the paper →

Paper 3 · cs.SE / cs.PL

Zero-Cost Graph Retrieval at Compiler-Grade Depth for AI Coding Agents

Aurelian Jibleanu · Neurogenesis · April 2026

We describe Neurogenesis, a graph-first code-memory engine that answers structural retrieval queries for AI coding agents without any LLM call on the read path. The engine ingests source code into a canonical-identifier graph via a tiered pipeline that selects the highest-precision indexing technology available per language.

Read the paper →

Cite. All three are pre-print (April 2026). Use the arXiv identifiers once they are assigned; until then, cite the title, the author (Aurelian Jibleanu, Neurogenesis), the year (2026), and the canonical URL on this site. A BibTeX block will be published alongside each paper once the arXiv IDs land.

02Reproduce

The benchmark is MIT. The engine is commercial.

Clone LongMemCode, plug in your own adapter, and you will reproduce the baseline and grep numbers in Paper 1 on your own laptop. The structural reference adapter points to a running ArgosBrain instance — the binary is ours, the protocol and the benchmark are open.