Aeon: High-Performance Neuro-Symbolic Memory Management for Long-Horizon LLM Agents

AuthorsMustafa Arslan

2026

TL;DR

Aeon uses a neuro symbolic Atlas plus Semantic Lookaside Buffer with INT8 quantization to reach 4.70 ns dot products and 3.09 µs tree traversal at 100K nodes.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Context bottleneck and Vector Haze in long horizon agents

LLMs hit the Context Bottleneck because self attention is O(N2) and suffer Lost in the Middle degradation as context windows grow.

Flat RAG with vector databases creates Vector Haze, where long horizon agents retrieve semantically similar but episodically disjoint facts, confusing planning and degrading multi day task execution.

HOW IT WORKS

Aeon Cognitive Operating System architecture

Aeon centers on a Core Shell model where a C++23 kernel manages the Atlas, Trace, Semantic Lookaside Buffer, Write Ahead Log, and Sidecar Blob Arena as OS like memory resources.

You can think of Aeon like an OS memory manager: the Atlas is paged disk, the Semantic Lookaside Buffer is an L1 cache, and the Trace is a process scheduler for episodic graphs.

This Zero Copy Constraint plus neuro symbolic structure lets Aeon manage long horizon memory with sub microsecond access, something a plain context window or Flat RAG cannot provide.

DIAGRAM

Semantic Lookaside Buffer retrieval pipeline

This diagram shows how Aeon routes a query through the Semantic Lookaside Buffer and falls back to the Atlas when the cache miss threshold is not met.

DIAGRAM

Aeon benchmarking and evaluation pipeline

This diagram shows how Aeon evaluates kernel performance, Atlas traversal, SLB caching, and EBR contention using synthetic Dense Forest datasets.

PROCESS

How Aeon Handles a Long Horizon Session

  1. 01

    Greedy SIMD Descent

    Aeon uses Greedy SIMD Descent in the Atlas with INT8 symmetric quantization to navigate a hierarchical spatial index in O(logB M) time.

  2. 02

    Semantic Lookaside Buffer

    Aeon inserts dequantized FP32 vectors into the Semantic Lookaside Buffer ring buffer, exploiting conversational locality for sub 5µs cache hits.

  3. 03

    Trace Block Index

    Aeon groups episodic events into Trace Blocks of size 1024 and runs a two phase SIMD scan to retrieve relevant time windows in O(|V|/B).

  4. 04

    Double Buffered Shadow Compaction

    Aeon performs Double Buffered Shadow Compaction with microsecond freezes, background copy, and hot swap to compact Atlas and Sidecar Blob Arena without stalling queries.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Atlas with INT8 Quantization

    Aeon introduces symmetric INT8 scalar quantization in the Atlas, shrinking node stride from 3,392 bytes to 1,088 bytes at D = 768 and achieving 5.6× faster dot products.

  • 02

    Write Ahead Log

    Aeon implements a decoupled Write Ahead Log with a 3 step lock ordering protocol, adding less than 1% overhead to insert latency while ensuring crash recoverability.

  • 03

    Sidecar Blob Arena

    Aeon adds a Sidecar Blob Arena with mmap backed blobs and generational garbage collection, eliminating the 440 character text ceiling for episodic Trace events.

RESULTS

By the Numbers

INT8 dot product latency

4.70 ns

-21.8 ns vs FP32 Cosine (26.5 ns)

Tree traversal 100K

3.09 µs

-7.41 µs vs FP32 traversal (10.5 µs)

Index file size 100K

141 MB

-299 MB vs FP32 index (440 MB)

P99 read latency

750 ns

under hostile 16 thread contention with Epoch Based Reclamation

Aeon is evaluated on synthetic Dense Forest datasets at D = 768 using Google Benchmark on an Apple M4 Max, measuring kernel latency, Atlas traversal, and concurrency. The main result shows Aeon’s INT8 Atlas achieves 3.4× faster traversal and 3.1× smaller indexes than FP32 while keeping P99 reads under 1 µs under 16 thread contention.

BENCHMARK

By the Numbers

Aeon is evaluated on synthetic Dense Forest datasets at D = 768 using Google Benchmark on an Apple M4 Max, measuring kernel latency, Atlas traversal, and concurrency. The main result shows Aeon’s INT8 Atlas achieves 3.4× faster traversal and 3.1× smaller indexes than FP32 while keeping P99 reads under 1 µs under 16 thread contention.

BENCHMARK

Kernel and Atlas Performance Impact of INT8 Storage

Latency and file size comparison for FP32 versus INT8 Atlas storage at D = 768 and N = 100,000.

KEY INSIGHT

The Counterintuitive Finding

Enabling Aeon’s Write Ahead Log adds less than 1% overhead, with median insert latency changing from 2.24 µs to 2.23 µs at 10,000 inserts.

This is surprising because WALs usually introduce noticeable disk flush costs, but Aeon’s 3 step lock ordering hides fdatasync latency behind RAM delta updates.

WHY IT MATTERS

What this unlocks for the field

Aeon makes sub microsecond, crash safe, long horizon memory practical by combining INT8 Atlas storage, Semantic Lookaside Buffer caching, and zero copy Core Shell integration.

Builders can now ship LLM agents that maintain week scale episodic memory with 3.1× smaller indexes and 5.6× faster similarity search than FP32 Flat RAG systems.

~13 min read← Back to papers

Related papers

Cognitive ArchitectureAgent Memory

Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents

Bin Wen, Ruoxuan Zhang et al.

· 2026

Neuro-Symbolic Dual Memory Framework uses Progress Memory, Feasibility Memory, a Blueprint Planner Agent, a Progress Monitor Agent, and an Actor Agent to decouple semantic progress guidance from executable feasibility checks. On ALFWorld, Neuro-Symbolic Dual Memory Framework achieves 94.78% success rate versus 88.81% for AWM, and on WebShop reaches 0.7132 score versus 0.5998 for WALL-E 2.0.

Cognitive ArchitectureAgent Memory

D-Mem: A Dual-Process Memory System for LLM Agents

Zhixing You, Jiachen Yuan, Jason Cai

· 2026

D-Mem combines Mem0∗, Quality Gating, and Full Deliberation into a dual-process memory system that incrementally stores vector memories and selectively scans raw history. On LoCoMo with GPT-4o-mini, D-Mem’s Quality Gating reaches 53.5 F1 versus the Mem0∗ baseline’s 51.2 F1, recovering 96.7% of the 55.3 F1 Full Deliberation performance with far fewer tokens.

Cognitive ArchitectureLong-Term Memory

Human-Like Lifelong Memory: A Neuroscience-Grounded Architecture for Infinite Interaction

Diego C. Lerma-Torres

· 2026

Human-Like Lifelong Memory combines Executive Function and Working Memory, a Memory Service Knowledge Graph, and a Thalamic Gateway to implement dual-process, valence-aware lifelong memory. Human-Like Lifelong Memory is a theoretical framework with seven functional properties and testable predictions rather than benchmark numbers against specific baselines.

Questions about this paper?

Paper: Aeon: High-Performance Neuro-Symbolic Memory Management for Long-Horizon LLM Agents

Answers use this explainer on Memory Papers.

Checking…