BayesPCN: A Continually Learnable Predictive Coding Associative Memory

AuthorsJason Yoo, Frank Wood

arXiv 20222022

TL;DR

BayesPCN uses predictive coding plus Bayesian weight updates to continually store up to 1024 high dimensional images with MSE as low as 0.0000 on hetero-associative recall.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Continual associative memories catastrophically forget older data

Associative memories often lack safeguards against catastrophic forgetting, so continually learning GPCN leads to forgetting information associated with older observations.

When GPCN is trained online on long sequences, recall on high dimensional images degrades, forcing costly offline refitting and limiting robust auto-associative and hetero-associative recall.

HOW IT WORKS

BayesPCN — Bayesian predictive coding associative memory

BayesPCN treats W 0:L as latent random variables, uses predictive coding for read, sequential importance sampling for write, and a diffusion-based forget mechanism on synaptic weights.

You can think of BayesPCN as a hippocampus-like module that continually updates a probabilistic weight memory, while predictive coding acts like iterative pattern completion in cortical layers.

This probabilistic update over weights lets BayesPCN perform one-shot continual writes and controlled forgetting, something a fixed context window or purely gradient-based GPCN cannot provide.

DIAGRAM

BayesPCN read and write dynamics over time

This diagram shows how BayesPCN alternates predictive coding read with Bayesian write and forget as new images stream in.

DIAGRAM

Evaluation pipeline for continual image recall

This diagram shows how BayesPCN is evaluated on CIFAR10 and Tiny ImageNet with sequential writes and noisy reads.

PROCESS

How BayesPCN Handles a Continual Image Recall Task

01
Read operation
BayesPCN initializes x0 with a corrupted query and uses predictive coding to hill climb log p(x0, h|x0 1:T ) over hidden activations.
02
Write operation
BayesPCN applies sequential importance sampling and conjugate Bayesian updates to p(W 0:L|x0 1:t ) using sampled h t from predictive coding.
03
Forget operation
BayesPCN diffuses R 1:L−1 and µ toward the prior and inflates U 1:L−1 and Σ using forget strength beta to counter memory overload.
04
Posterior predictive read
BayesPCN uses the mixture posterior ˆp(W|x0 1:t ) to compute log p(x0, h|x0 1:t−1 ) and perform auto-associative or hetero-associative recall.

KEY CONTRIBUTIONS

Key Contributions

01
BayesPCN continual associative memory
BayesPCN introduces a hierarchical predictive coding network with Bayesian W 0:L updates that can continually learn up to 1024 high dimensional images with robust recall.
02
Sequential write as inference
BayesPCN casts memory write as sequential importance sampling over p(W 0:L|x0 1:t ), using local conjugate normal normal updates at each layer.
03
Diffusion based forget mechanism
BayesPCN proposes a diffusion based forget that nudges R 1:L−1 and µ toward the prior and increases U 1:L−1 and Σ, enabling recovery of the original memory state after repeated forgetting.

RESULTS

By the Numbers

White Noise CIFAR10 MSE

0.0337

+0.0127 over GPCN Online at sequence length 1024

White Noise Tiny ImageNet MSE

0.6606

+0.6417 vs GPCN Online at sequence length 1024

Mask CIFAR10 MSE

0.0019

−0.0772 vs GPCN Online at sequence length 1024

Mask Tiny ImageNet MSE

0.0000

−0.0698 vs GPCN Online at sequence length 1024

On CIFAR10 and Tiny ImageNet image recovery tasks with sequence lengths up to 1024, BayesPCN is compared against Identity, MHN, offline GPCN, and online GPCN using pixel wise MSE. These results show that BayesPCN maintains near offline GPCN quality on hetero-associative recall while online GPCN degrades sharply, especially on masked Tiny ImageNet where BayesPCN achieves 0.0000 MSE versus 0.0698 MSE for GPCN Online.

BENCHMARK

By the Numbers

BENCHMARK

White Noise CIFAR10 MSE at sequence length 1024

Pixel wise MSE on CIFAR10 white noise recall after 1024 sequential writes.

BENCHMARK

Mask CIFAR10 MSE at sequence length 1024

Pixel wise MSE on CIFAR10 mask hetero associative recall after 1024 sequential writes.

KEY INSIGHT

The Counterintuitive Finding

BayesPCN achieves 0.0000 MSE on Tiny ImageNet mask hetero associative recall at sequence length 1024, while GPCN Online reaches 0.0698 MSE.

This is surprising because higher dimensional Tiny ImageNet images might seem harder to recall, yet BayesPCN finds them easier than CIFAR10 due to greater separation between datapoints.

WHY IT MATTERS

What this unlocks for the field

BayesPCN enables continually learnable hierarchical associative memory that can one shot write hundreds of high dimensional observations without catastrophic forgetting.

Builders can now attach BayesPCN as a robust external memory for controllers, gaining noise tolerant auto associative and hetero associative recall without expensive offline retraining.

~14 min read← Back to papers

Related papers

Memory Architecture

A Control Architecture for Training-Free Memory Use

Yanzhen Lu, Muchen Jiang et al.

· 2026

TAG routes low-confidence steps to uncertainty-based routing, filters them with guarded acceptance with rollback, chooses between bank selection across rule and exemplar memory, and prunes via evidence-based retirement inside a unified control loop. On SVAMP and ASDiv, TAG reaches 81.0% and 85.2% accuracy, improving over the 74.0% and 77.5% no-memory baselines while a compute-matched Retry baseline stays flat.

arXiv:2604.18206 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…