Self-Attentive Associative Memory

AuthorsHung Le, Truyen Tran, Svetha Venkatesh

2020

TL;DR

Self-Attentive Associative Memory (STM) uses the SAM operator with outer-product self-attention to reach 0.39% mean error on bAbI, beating MNM-p by 0.16 points.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Neural memories lack explicit relational storage and reuse

Existing memory-augmented networks store items but not rich relationships, leading to lossy memory interactions and weak relational reasoning.

Self-Attentive Associative Memory (STM) targets tasks like Nth-farthest and Relational Associative Recall, where missing relational memory prevents correct retrieval of items conditioned on complex relationships.

HOW IT WORKS

Self-Attentive Associative Memory and the SAM-based Two-memory Model

Self-Attentive Associative Memory (STM) centers on Outer Product Attention (OPA) and the Self-attentive Associative Memory (SAM) operator, wired through Mi-Write, Mr-Read, and Mr-Transfer between item and relational memories.

You can view STM like a brain-inspired system where an associative item memory plays perirhinal cortex and a higher-order relational memory plays hippocampus, linked by outer-product-based attention.

By using SAM’s outer-product bindings instead of plain dot-product attention, STM preserves bit-level relationships that a fixed context window or scalar attention scores cannot represent or reuse.

DIAGRAM

Sequential interaction between item and relational memory in STM

This diagram shows how Self-Attentive Associative Memory (STM) processes a timestep, updating Mi and Mr and producing ot.

DIAGRAM

Evaluation pipeline across tasks for STM

This diagram shows how Self-Attentive Associative Memory (STM) is trained and evaluated on synthetic, geometric, RL, and bAbI tasks.

PROCESS

How Self-Attentive Associative Memory Handles a Sequential Task

  1. 01

    Mi-Write

    Self-Attentive Associative Memory (STM) uses Mi-Write to encode xt into Mi via outer-product Xt = f1(xt) ⊗ f2(xt) and gated update in Eq. 10.

  2. 02

    Mr-Read

    Self-Attentive Associative Memory (STM) applies Mr-Read to contract Mr with f3(xt) and f2(xt), producing vr_t that summarizes distant relational information.

  3. 03

    Mi-Read Mr-Write

    Self-Attentive Associative Memory (STM) feeds Mi_t and vr_t ⊗ f2(xt) into SAM, using Outer Product Attention (OPA) to update Mr_t with new hetero-associative memories.

  4. 04

    Mr-Transfer and Output Distillation

    Self-Attentive Associative Memory (STM) uses Mr-Transfer with G1 to enrich Mi, and applies G2 and G3 to distill Mr into the output vector ot.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    SAM-based Two-memory Model (STM)

    Self-Attentive Associative Memory (STM) introduces a dual system with Mi and Mr, linked by Mi-Write, Mr-Read, Mi-Read Mr-Write, and Mr-Transfer, to jointly support memorization and relational reasoning.

  • 02

    Self-attentive Associative Memory operator

    Self-Attentive Associative Memory (STM) defines SAM, which uses Outer Product Attention (OPA) to transform a second-order item memory into a third-order relational memory storing d²-scalars per query.

  • 03

    State-of-the-art bAbI performance

    Self-Attentive Associative Memory (STM) achieves 0.39 ± 0.18 mean error and 0.15 best error on bAbI, improving over MNM-p’s 0.55 ± 0.74 and 0.18.

RESULTS

By the Numbers

Mean error

0.39

-0.16 vs MNM-p mean error 0.55

Best error

0.15

-0.03 vs MNM-p best error 0.18

Associative retrieval epochs length 30

10

-25 epochs vs WeiNet 35 and -40 vs Fast weight 50

Nth-farthest accuracy

98

+7 points vs RMC accuracy 91

On the bAbI question answering benchmark, which tests 20 reasoning tasks, Self-Attentive Associative Memory (STM) achieves 0.39 ± 0.18 mean error and 0.15 best error. These results show that STM’s dual item–relational memory with SAM improves both accuracy and stability over prior memory networks like MNM-p and DNC.

BENCHMARK

By the Numbers

On the bAbI question answering benchmark, which tests 20 reasoning tasks, Self-Attentive Associative Memory (STM) achieves 0.39 ± 0.18 mean error and 0.15 best error. These results show that STM’s dual item–relational memory with SAM improves both accuracy and stability over prior memory networks like MNM-p and DNC.

BENCHMARK

bAbI task: mean error over 20 tasks

Mean error (%) on the joint bAbI 20-task benchmark.

BENCHMARK

Nth-farthest task: test accuracy comparison

Test accuracy (%) on the Nth-farthest relational reasoning task.

KEY INSIGHT

The Counterintuitive Finding

Self-Attentive Associative Memory (STM) with nq = 8 reaches 98% accuracy on Nth-farthest, while TPR achieves only 13% despite being designed for reasoning.

This is surprising because high-order fast-weight models like TPR are expected to excel at relational tasks, yet STM’s SAM-based dual memory yields an 85-point advantage.

WHY IT MATTERS

What this unlocks for the field

Self-Attentive Associative Memory (STM) unlocks reusable, high-fidelity relational memory that can be read, updated, and distilled across long sequences.

Builders can now design agents that jointly memorize rich items and their higher-order relationships, enabling tasks like Relational Associative Recall and feature-based graph reasoning that were previously brittle or impractical.

~13 min read← Back to papers

Related papers

Memory Architecture

A Control Architecture for Training-Free Memory Use

Yanzhen Lu, Muchen Jiang et al.

· 2026

TAG routes low-confidence steps to uncertainty-based routing, filters them with guarded acceptance with rollback, chooses between bank selection across rule and exemplar memory, and prunes via evidence-based retirement inside a unified control loop. On SVAMP and ASDiv, TAG reaches 81.0% and 85.2% accuracy, improving over the 74.0% and 77.5% no-memory baselines while a compute-matched Retry baseline stays flat.

Questions about this paper?

Paper: Self-Attentive Associative Memory

Answers use this explainer on Memory Papers.

Checking…