Memory-Augmented Transformers: A Systematic Review from Neuroscience Principles to Enhanced Model Architectures

AuthorsParsa Omidi, Xingshuai Huang, Axel Laborieux, Bahareh Nikpour

arXiv 20252025

TL;DR

Memory-Augmented Transformers review unifies neuroscience-inspired memory operations and taxonomies to chart a roadmap toward cognitively inspired, lifelong-learning Transformer architectures.

THE PROBLEM

Transformers Forget Beyond Fixed Context Windows and Static Weights

Standard Transformers suffer from quadratic self attention complexity, forcing limited context windows and aggressive KV cache eviction that discards important information.

This breaks long horizon reasoning and continual learning, causing loss of user specific context, brittle adaptation, and severe energy inefficiency compared to biological memory.

HOW IT WORKS

Memory-Augmented Transformers — Neuroscience Principles to Taxonomic Design

Memory-Augmented Transformers centers on functional objectives, memory types, and integration techniques to systematically organize memory architectures for Transformers.

An analogy is a brain inspired computer where sensory buffers, working memory, and long term stores act like coordinated RAM, cache, and disk under neuromodulatory control.

This organization enables mechanisms like multi timescale memory, surprise gated updates, and hierarchical buffering that plain context windows and static KV caches cannot provide.

DIAGRAM

Three Axis Taxonomy of Memory Augmented Transformers

This diagram shows how Memory-Augmented Transformers organizes methods by functional objectives, memory types, and integration techniques.

DIAGRAM

Evaluation Dimensions and Challenge Landscape

This diagram shows how Memory-Augmented Transformers evaluates methods along operations, challenges, and future directions.

PROCESS

How Memory-Augmented Transformers Structures a Memory System Lifecycle

  1. 01

    Architecture of Human Memory

    Memory-Augmented Transformers first analyzes Architecture of Human Memory, detailing sensory memory, working memory, and long term memory as guiding components.

  2. 02

    Interactions Between Memory Systems

    Memory-Augmented Transformers then studies Interactions Between Memory Systems, emphasizing encoding, consolidation, retrieval, and top down modulation as coordination patterns.

  3. 03

    Computational Principles from Biological Memory

    Memory-Augmented Transformers extracts Computational Principles from Biological Memory, such as hierarchical resource allocation and neuromodulatory gating, to inform Transformer design.

  4. 04

    Taxonomy of Memory Augmented Transformers

    Memory-Augmented Transformers finally builds the Taxonomy of Memory Augmented Transformers, organizing methods by functional objectives, memory types, and integration techniques.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Comprehensive Three Axis Taxonomy

    Memory-Augmented Transformers introduces a three axis taxonomy using functional objectives, memory types, and integration techniques to categorize diverse memory architectures.

  • 02

    Biological to Transformer Mapping

    Memory-Augmented Transformers systematically maps sensory memory, working memory, and long term memory to embeddings, attention contexts, and parametric or non parametric memory.

  • 03

    Analysis of Memory Operations

    Memory-Augmented Transformers analyzes reading, writing, forgetting, and capacity management mechanisms, highlighting shifts toward adaptive, test time learning and surprise gated updates.

RESULTS

By the Numbers

Context window scaling

up to 50M tokens

ARMT demonstrates associative retrieval over 50M tokens vs basic Transformers limited by quadratic attention

Long context range

up to 10M tokens

EM LLM episodic segmentation supports sequences up to 10M tokens vs fixed window baselines

Working memory span

4–7 chunks

matches human working memory estimates from Cowan 2008 for biological comparison

Sensory trace duration

≈250 ms and 2–3 s

iconic traces last ≈250 ms and echoic traces 2–3 s in biological sensory memory

Memory-Augmented Transformers is a survey, so quantitative highlights come from referenced systems like ARMT and EM LLM and from biological baselines. These numbers show how Memory-Augmented Transformers spans from millisecond sensory traces to multi million token artificial contexts.

BENCHMARK

By the Numbers

Memory-Augmented Transformers is a survey, so quantitative highlights come from referenced systems like ARMT and EM LLM and from biological baselines. These numbers show how Memory-Augmented Transformers spans from millisecond sensory traces to multi million token artificial contexts.

BENCHMARK

Memory Types Emphasized in Memory Augmented Transformers

Qualitative distribution of memory type categories discussed in Memory-Augmented Transformers.

KEY INSIGHT

The Counterintuitive Finding

Memory-Augmented Transformers highlights that less than 5% of brain activity is conscious, while over 95% operates unconsciously for memory processing.

This challenges the assumption that powerful memory systems must be fully explicit and suggests Transformer memory should prioritize automatic, unconscious style operations.

WHY IT MATTERS

What this unlocks for the field

Memory-Augmented Transformers unlocks a unified design language for building multi timescale, adaptive memory systems grounded in biological principles and concrete architectural patterns.

Builders can now mix parameter encoded, state based, explicit, and hybrid memories with attention fusion, gated control, and associative retrieval in a principled way.

~18 min read← Back to papers

Related papers

SurveyAgent Memory

Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations

Dongming Jiang, Yi Li et al.

arXiv 2026 · 2026

Anatomy of Agentic Memory organizes Memory-Augmented Generation into four structures and empirically compares systems like LOCOMO, AMem, MemoryOS, Nemori, MAGMA, and SimpleMem under benchmark saturation, metric validity, backbone sensitivity, and system cost. On the LoCoMo benchmark, Anatomy of Agentic Memory shows Nemori reaches 0.502 F1 while AMem drops to 0.116, and MAGMA achieves the top semantic judge score of 0.670 under the MAGMA rubric.

Memory ArchitectureSurvey

Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

Zhongming Yu, Naicheng Yu et al.

arXiv 2026 · 2026

Multi-Agent Memory Architecture organizes **Agent IO Layer**, **Agent Cache Layer**, and **Agent Memory Layer** plus **Agent Cache Sharing** and **Agent Memory Access** protocols into a unified architectural framing for multi-agent systems. The position-only SYS_NAME proposes no benchmark MAIN_RESULT or numeric comparison against any baseline.

Survey

From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs

Yaxiong Wu, Sheng Liang et al.

arXiv 2025 · 2025

From Human Memory to AI Memory organizes LLM memory using the **3D-8Q Memory Taxonomy**, mapping human memory categories to personal and system memory across object, form, and time. From Human Memory to AI Memory reports no new benchmarks but consolidates systems like MemoryBank, HippoRAG, and MemoRAG into a single conceptual framework.