Category

Memory Architecture

Architectures and systems for organizing, storing, and accessing AI memory.

54 papers

Memory Architecture

A Control Architecture for Training-Free Memory Use

Yanzhen Lu, Muchen Jiang et al.

· 2026

TAG routes low-confidence steps to uncertainty-based routing, filters them with guarded acceptance with rollback, chooses between bank selection across rule and exemplar memory, and prunes via evidence-based retirement inside a unified control loop. On SVAMP and ASDiv, TAG reaches 81.0% and 85.2% accuracy, improving over the 74.0% and 77.5% no-memory baselines while a compute-matched Retry baseline stays flat.

SurveyBenchmarkAgent MemoryLong-Term MemoryMemory Architecture

A Survey on the Security of Long-Term Memory in LLM Agents: Toward Mnemonic Sovereignty

Zehao Lin, Chunyu Li, Kai Chen

· 2026

Mnemonic Sovereignty analyzes long term Write, Store, Retrieve, Execute, Share, and Forget Rollback phases against integrity, confidentiality, availability, and governance objectives for agent memory. Mnemonic Sovereignty’s lifecycle matrix shows most of the ~70 works cluster on write and retrieve integrity, leaving store, availability, and governance primitives like write gate validation and post deletion verification almost entirely unexplored.

Memory Architecture

Auxiliary-predicted Compress Memory Model(ApCM Model): A Neural Memory Storage Model Based on Invertible Compression and Learnable Prediction

Weinuo Ou

· 2026

Auxiliary-predicted Compress Memory Model (ApCM Model) combines an Invertible Dimensionality Reduction and Predictor (IDRP) module with a Memory Read-Write Controller, including a global Memory Bank, cosine-similarity read, and access-frequency write policy. ApCM Model achieves lower MSE (0.987171 vs 1.001440) than a Key-Value Memory Network while compressing memory from 1024 to 128 dimensions on random data.

Memory Architecture

Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents

Natchanon Pollertlam, Witchayut Kornsuwannawit

· 2026

Beyond the Context Window compares Conversation Segmentation, Fact Extraction, Embedding and Storage, and Retrieval Mechanism in a Mem0-based memory system against long-context GPT-5-mini. On LongMemEval, Beyond the Context Window finds LC GPT-5-mini reaches 82.40% accuracy, 33.4 percentage points above the memory system baseline.

BenchmarkAgent MemoryMemory Architecture

GAM: Hierarchical Graph-based Agentic Memory for LLM Agents

Zhaofen Wu, Hanrong Zhang et al.

· 2026

GAM builds a Hierarchical Graph Memory Architecture with a global Topic Associative Network, local Event Progression Graphs, State-Based Memory Consolidation, and Graph-Guided Multi-Factor Retrieval to decouple encoding from consolidation. On LoCoMo with Qwen2.5-7B, GAM attains an Average F1 of 40.00 compared to Mem0’s 35.38, and on LongDialQA with Qwen2.5-7B, GAM reaches 12.55 F1 vs MemoryOS at 6.76.

BenchmarkAgent MemoryLong-Term MemoryMemory Architecture

Lightweight LLM Agent Memory with Small Language Models

Jiaquan Zhang, Chaoning Zhang et al.

· 2026

LightMem orchestrates SLM-1 Controller, SLM-2 Selector, SLM-3 Writer, and STM MTM LTM stores to modularize retrieval, writing, and offline consolidation. On LoCoMo, LightMem reaches 34.50 F1 for GPT-4o multi hop questions, +1.64 over A-MEM, while keeping median retrieval latency at 83 ms.

RAGAgent MemoryLong-Term MemoryMemory Architecture

Memory as Metabolism: A Design for Companion Knowledge Systems

Stefan Miteski

· 2026

Memory as Metabolism defines companion knowledge systems with five retention operations (TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, AUDIT) plus memory gravity and minority-hypothesis retention over a raw buffer, active wiki, and cold memory. Instead of benchmark gains, Memory as Metabolism’s main result is a governance specification that separates descriptive, taxonomic, and normative claims and predicts improved coherence stability, fragility resistance, monoculture resistance, and effective minority-hypothesis influence for companion wikis.

Memory Architecture

ACON: Optimizing Context Compression for Long-horizon LLM Agents

Minki Kang, Wei-Ning Chen et al.

arXiv 2025 · 2025

ACON combines History Compression, Observation Compression, Compression Guideline Optimization, and Compressor Distillation to rewrite agent histories and observations into concise, task-aware summaries. On AppWorld, ACON UTCO with gpt-4.1 achieves 56.5% accuracy with 7.33k peak tokens, versus 56.0% accuracy with 9.93k peak tokens for No compression.

RAGBenchmarkBenchmarkBenchmarkAgent MemoryLong-Term MemoryMemory Architecture

Evaluating Long-Term Memory for Long-Context Question Answering

Alessandra Terranova, Björn Ross, Alexandra Birch

· 2025

Evaluating Long-Term Memory for Long-Context Question Answering compares Full Context, RAG, A-Mem, RAG+PromptOpt, and RAG+EpMem memory components across semantic, episodic, and procedural memory for long conversational QA. On LoCoMo, RAG+EpMem reaches an average F1 ranking of 1.83 for Llama 3.2-3B Instruct and 1.80 for GPT-4o mini while using around 1,000 tokens per query versus over 23,000 for Full Context.

BenchmarkBenchmarkAgent MemoryMemory Architecture

Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy-Aware Generative Agents

Saad Alqithami

· 2025

MaRS organizes agent memory into episodic, semantic, social, and task nodes with provenance, scored by a privacy-aware retention controller and governed by FIFO, LRU, Priority Decay, Reflection-Summary, Random-Drop, and Hybrid policies. On the FiFA benchmark, the Hybrid policy in MaRS achieves a composite score of ≈0.911 across 300 runs and five memory budgets, outperforming simpler policies while preserving privacy and cost efficiency.

Agent MemoryMemory Architecture

Hindsight is 20/20: Building Agent Memory that Retains, Recalls, and Reflects

Chris Latimer, Nicoló Boschi et al.

· 2025

HINDSIGHT organizes agent memory into four networks via TEMPR and layers CARA on top to retain, recall, and reflect with explicit opinions and behavioral profiles. On LongMemEval, HINDSIGHT with Gemini-3 Pro scores 91.4% overall versus 60.2% for full-context GPT-4o, while HINDSIGHT with OSS-20B jumps from 39.0% to 83.6% over a full-context OSS-20B baseline.

RAGBenchmarkBenchmarkMemory Architecture

Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

Chulun Zhou, Chunkang Zhang et al.

· 2025

HGMEM represents working memory as a hypergraph with Hypergraph-based Memory Storage, Adaptive Memory-based Evidence Retrieval, and Dynamic Memory Evolving to build high-order correlations across entities and facts. On Prelude long narrative understanding, HGMEM with GPT-4o achieves 73.81% accuracy compared to 72.22% for HippoRAG v2, while also reaching 69.74 comprehensiveness on Longbench generative sense-making QA.

RAGBenchmarkBenchmarkMemory Architecture

Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation

Jackson Hassell, Dan Zhang et al.

· 2025

Learning from Supervision with Semantic and Episodic Memory combines a performance agent, critic agent, semantic memory, episodic memory, and memory retriever to turn label-grounded critiques into reusable supervision without parameter updates. On the Multi-Condition Ranking dataset with Mixtral 8x22B and o4-mini as critic, Learning from Supervision with Semantic and Episodic Memory reaches 85.6% accuracy, a 24.8% gain over the EP_LABEL baseline at 60.8%.

Agent MemoryLong-Term MemoryMemory Architecture

LiCoMemory: Lightweight and Cognitive Agentic Memory for Efficient Long-Term Reasoning

Zhengjun Huang, Zhoujin Tian et al.

· 2025

LiCoMemory organizes long term dialogue with CogniGraph, Query Processing and Integrated Rerank, and Real Time Interactions to keep session summaries, triples, and chunks linked. On LongMemEval with GPT-4o-mini, LiCoMemory reaches 73.80% accuracy and 76.63% recall, beating Mem0g by 9.0 and 7.1 points.

BenchmarkLong-Term MemoryMemory Architecture

LightMem: Lightweight and Efficient Memory-Augmented Generation

Jizhan Fang, Xinle Deng et al.

· 2025

LightMem pipelines a Cognitive-Inspired Sensory Memory, Topic Segmentation Submodule, Topic-Aware Short-Term Memory, and Long-Term Memory with Sleep-Time Update to filter, group, summarize, and asynchronously consolidate dialogue history. On LongMemEval-S with Qwen3-30B-A3B-Instruct-2507, LightMem reaches 70.20% ACC vs 65.20% for A-MEM (+5.00 points) while reducing total token usage by up to 21.8× and API calls by up to 17.1×.

BenchmarkAgent MemoryMemory Architecture

MemEvolve: Meta-Evolution of Agent Memory Systems

Guibin Zhang, Haotian Ren et al.

· 2025

MemEvolve decomposes agent memory into Encode, Store, Retrieve, and Manage modules and meta evolves these components via a dual evolution process over candidate architectures. On xBench DeepSearch, MemEvolve with GPT 5 mini raises Flash Searcher pass@1 from 69.0 to 74.0 and WebWalkerQA accuracy from 58.82 to 61.18 while keeping API cost near 0.141 per query.

Memory Architecture

Memorization to Generalization: Emergence of Diffusion Models from Associative Memory

Bao Pham, Gabriel Raya et al.

arXiv 2025 · 2025

Memorization to Generalization recasts diffusion training and sampling as Dense Associative Memory dynamics, analyzing memorized, spurious, and generalized states via energy basins and curvature. Memorization to Generalization shows that as training size grows on MNIST, CIFAR10, FASHION-MNIST, LSUN-CHURCH, and Stable Diffusion, spurious states peak at the memorization–generalization boundary and have distinct basin volume and curvature signatures.

RAGBenchmarkBenchmarkMemory Architecture

Memory-Augmented Log Analysis with Phi-4-mini: Enhancing Threat Detection in Structured Security Logs

Anbi Guo, Mahfuza Farooque

· 2025

DM-RAG augments Phi-4-mini with a Short-Term Memory (STM) buffer, Long-Term Memory (LTM) FAISS store, Bayesian fusion, and a logistic regression confidence model for structured log analysis. On UNSW-NB15, DM-RAG reaches 98.70% recall and 69.59% F1, beating the Phi-4 + RAG (MITRE) baseline in F1 by 17.89 points.

SurveyCognitive ArchitectureMemory Architecture

Memory-Augmented Transformers: A Systematic Review from Neuroscience Principles to Enhanced Model Architectures

Parsa Omidi, Xingshuai Huang et al.

arXiv 2025 · 2025

Memory-Augmented Transformers organizes functional objectives, memory types, and integration techniques into a unified taxonomy that connects biological memory principles with concrete architectures like Memformer, Titans, ATLAS, and EMAT. Memory-Augmented Transformers’ main result is a systematic three-dimensional classification that links dynamic multi-timescale memory, selective attention, and consolidation to specific Transformer designs and emerging lifelong-learning paradigms.

RAGBenchmarkMemory Architecture

Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

Jiaqi Cao, Jiarui Wang et al.

· 2025

Memory Decoder combines a Pre-training stage that aligns with kNN-LM distributions and an Inference interpolation mechanism that mixes Memory Decoder and base LLM outputs without changing base parameters. On Wikitext-103, Memory Decoder with 124M parameters reaches 13.36 perplexity on GPT2-small versus 14.76 for DAPT, and on specialized domains a single 0.5B Memory Decoder reduces average perplexity from 14.88 to 4.05 on Qwen2-0.5B.

PickMemory Architecture

MemOS: A Memory OS for AI System

Zhiyu Li, Chenyang Xi et al.

arXiv 2025 · 2025

MemOS introduces MemCube, MemScheduler, MemOperator, and MemLifecycle to treat plaintext, activation, and parameter memories as first-class resources with unified APIs and governance. MemOS achieves state-of-the-art performance across PreFEval, PersonaMem, LongMemEval, and LoCoMo compared to MIRIX, Mem0, Zep, Memobase, MemU, and Supermemory, though exact benchmark scores are only summarized qualitatively in Figure 1.

BenchmarkMemory Architecture

MMAG: Mixed Memory-Augmented Generation for Large Language Models Applications

Stefano Zeppieri

· 2025

MMAG organizes conversational memory, long-term user memory, episodic and event-linked memories, sensory and context-aware memory, and short-term working memory under a modular memory controller integrated with Heero’s encrypted Firestore and S3 stores. MMAG delivers a 20% increase in user retention and a 30% increase in average conversation duration on the Heero language learning platform compared to its pre-memory deployment.

RAGLong-Term MemoryMemory Architecture

Mnemosyne: An Unsupervised, Human-Inspired Long-Term Memory Architecture for Edge-Based LLMs

Aneesh Jonelagadda, Christina Hahn et al.

· 2025

Mnemosyne combines a Commitment pipeline with substance and redundancy filters, a probabilistic Recall traversal over a graph-structured store, asynchronous Core Summary updates, and a Pruning module to manage long-term memory on edge devices. On the LoCoMo benchmark, Mnemosyne reaches 60.42% temporal reasoning J-score and a 54.55% overall J-score, compared to 51.55% temporal reasoning and 62.74% overall for Memory-R1, and achieves a 65.8% win rate over a 31.07% naive RAG baseline in human evaluations.

Memory Architecture

Muon Outperforms Adam in Tail-End Associative Memory Learning

Shuche Wang, Fengzhuo Zhang et al.

· 2025

Muon Outperforms Adam in Tail-End Associative Memory Learning analyzes how VO attention weights, FFN matrices, normalized SVD entropy, and effective rank behave under Muon versus Adam in transformer associative memories. Muon Outperforms Adam in Tail-End Associative Memory Learning finds that applying Muon to VO and FFN nearly recovers full-Muon validation loss (3.5654 vs 3.9242 for All Adam at 10k steps on FineWeb) while improving tail-class accuracy on a heavy-tailed QA task compared to Adam and SGD+Momentum.

RAGBenchmarkAgent MemoryMemory Architecture

Semantic Anchoring in Agentic Memory: Leveraging Linguistic Structures for Persistent Conversational Context

Maitreyi Chatterjee, Devansh Agarwal

· 2025

Semantic Anchoring enriches conversational memory by combining a hybrid memory store with dense and symbolic indexes, structured memory representation tuples, hybrid storage and indexing, and a retrieval scoring method. On MultiWOZ-Long, Semantic Anchoring reaches 83.5% Factual Recall and 80.8% Discourse Coherence, beating Entity-RAG by 7.6 and 8.6 points respectively.

RAGMemory Architecture

TeleMem: Building Long-Term and Multimodal Memory for Agentic AI

Chunliang Chen, Ming Guan et al.

· 2025

TeleMem converts interactions into unified semantic nodes via the representation layer, organizes them in a memory graph with Insert and ReInsert, and reads them using closure-based retrieval and a ReAct-style multimodal agent. On ZH-4O, TeleMem reaches 86.33% QA Accuracy, beating the Mem0 baseline at 70.20% and the RAG baseline at 62.45%.

Memory Architecture

Test-time regression: a unifying framework for designing sequence models with associative memory

Ke Alexander Wang, Jiaxin Shi, Emily B. Fox

· 2025

Test-time regression uses memorization as regression, memory retrieval, and test-time regression layers to reinterpret sequence architectures as solving a regression problem over key value pairs during the forward pass. This unification shows how linear attention, state space models, fast weight programmers, online learning layers, and softmax attention are all instances of the same framework and explains phenomena like linear attention’s failures and the role of query key normalization.

PickMemory Architecture

Titans: Learning to Memorize at Test Time

Ali Behrouz, Peilin Zhong, Vahab Mirrokni

arXiv 2025 · 2025

Titans combines a Core short-term attention block, a deep Long-term Memory module, and Persistent Memory tokens, with three integration variants: Memory as a Context (MAC), Memory as a Gate (MAG), and Memory as a Layer (MAL). On language modeling and reasoning benchmarks, Titans (MAC) at 760M parameters achieves 52.51 average accuracy vs 51.49 for Gated DeltaNet-H2, while also solving BABILong tasks that defeat GPT-4.

Memory Architecture

Understanding Transformer from the Perspective of Associative Memory

Shu Zhong, Mingyu Xu et al.

· 2025

Understanding Transformer from the Perspective of Associative Memory reframes Softmax Attention, Linear Attention, FFN, and DeltaNet as instances of a unified associative memory with explicit memory capacity and update rules. Using this lens, Understanding Transformer from the Perspective of Associative Memory derives retrieval SNR for different kernels, unifies attention and FFNs, and proves that DeltaFormer achieves circuit complexity beyond TC0, reaching NC1 expressivity.

Agent MemoryMemory Architecture

WebATLAS: An LLM Agent with Experience-Driven Memory and Action Simulation

Jiali Cheng, Anjishnu Kumar et al.

· 2025

WebATLAS combines a Planner, Actor, Critic, and Multi-layered Memory (Working Memory, Cognitive Map, Semantic Memory) to simulate and score actions before executing them on the web. On WebArena-Lite, WebATLAS achieves 63.0% average success versus 53.9% for Plan-and-Act, a +9.1 point gain without website-specific fine-tuning.

BenchmarkBenchmarkMemory Architecture

WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

Woongyeong Yeo, Kangsan Kim et al.

· 2025

WorldMM dynamically coordinates Episodic Memory, Semantic Memory, Visual Memory, an Adaptive Retrieval Agent, and a Response Agent to answer queries over hour- to week-long videos. On five long video QA benchmarks, WorldMM-GPT reaches 69.5% average accuracy, beating M3-Agent’s 55.1% by 14.4 points and the best prior memory baseline HippoRAG’s 57.0% by 12.5 points.

Memory Architecture

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

Yibo Jiang, Goutham Rajendran et al.

· 2024

Do LLMs dream of elephants studies how a self-attention layer, value matrix, embedding matrix, latent concept association task, and context hijacking prompts interact to implement associative memory in transformers. Do LLMs dream of elephants proves theoretically (Theorem 1, Theorem 4) and empirically that a one-layer transformer can achieve arbitrarily small error on latent concept association by using the value matrix as associative memory.

Memory Architecture

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

Yabin Zhang, Wenjie Zhu et al.

arXiv 2024 · 2024

Dual Memory Networks combines a Dynamic Memory Network, Static Memory Network, a shared ReadOut module, Projection Layers ω, and a Memory Interactive Strategy to build sample-adaptive classifiers on top of frozen CLIP encoders. On zero-shot ImageNet with ViT-B/16, Dual Memory Networks achieves 72.25% accuracy vs 66.73% for CLIP and 68.98% for TPT, a +5.52 and +3.27 point gain respectively.

Memory Architecture

Self-evolving Agents with reflective and memory-augmented abilities

Xuechen Liang, Yangfan He et al.

· 2024

SAGE coordinates Iterative Feedback, Reflection, Short-Term Memory, Long-Term Memory, and MemorySyntax so the assistant, checker, and user co-evolve policies and memories over time. On AgentBench and long-context QA like HotpotQA, SAGE lifts GPT-3.5’s Database score from 25.9 to 37.6 and HotpotQA answer accuracy from 48.5% to 68.3%.

Memory Architecture

Understanding Factual Recall in Transformers via Associative Memories

Eshaan Nichani, Jason D. Lee, Alberto Bietti

· 2024

Understanding Factual Recall in Transformers via Associative Memories analyzes linear associative memories, MLP associative memories, and a one-layer transformer with multi-head self-attention plus an MLP on a synthetic factual recall task. Understanding Factual Recall in Transformers via Associative Memories proves that storing N random associations requires Θ(N log M) bits and that a single-layer transformer achieves 100% accuracy whenever either self-attention or MLP parameters scale linearly with the number of facts.

Memory Architecture

Hierarchical Neural Memory Network for Low Latency Event Processing

Ryuhei Hamaguchi, Yasutaka Furukawa et al.

arXiv 2023 · 2023

Hierarchical Neural Memory Network (HMNet) stacks multi-level latent memories z1–z3 with Event-write, Up-write, Down-write, Update, and Readout operations driven by Event Sparse Cross Attention. On DSEC-Semantic, HMNet-L3 reaches 57.4 mIoU with event–RGB fusion, improving over a ResNet-50 baseline at 54.1 mIoU while also reducing latency, and on GEN1 HMNet-B1 matches AED at similar mAP with 57% lower latency.

Memory Architecture

MBPTrack: Improving 3D Point Cloud Tracking with Memory Networks and Box Priors

Tian-Xing Xu, Yuan-Chen Guo et al.

arXiv 2023 · 2023

MBPTrack combines a Decoupling Feature Propagation Module, BPLocNet, box-prior sampling, and point-to-reference aggregation to track 3D objects from point clouds using temporal memory and size-aware localization. On KITTI, MBPTrack achieves 70.3% Success and 87.9% Precision, improving over CXTrack’s 67.5%/85.3% by +2.8/+2.6.

Memory Architecture

BayesPCN: A Continually Learnable Predictive Coding Associative Memory

Jason Yoo, Frank Wood

arXiv 2022 · 2022

BayesPCN combines predictive coding, conjugate Bayesian updates over W 0:L, sequential importance sampling, and a diffusion-based forget mechanism to build a hierarchical associative memory that supports continual one-shot writes. On CIFAR10 and Tiny ImageNet hetero-associative tasks, BayesPCN matches offline GPCN with MSE as low as 0.0000 while online GPCN rises to 0.0791 MSE on CIFAR10 mask at sequence length 1024.

Memory Architecture

Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models

Beren Millidge, Tommaso Salvatori et al.

· 2022

Universal Hopfield Networks decompose single-shot associative memory into similarity, separation, and projection, instantiating Hopfield networks, sparse distributed memories, dense associative memories, and modern continuous Hopfield networks within one energy-based framework. Universal Hopfield Networks then compare dot product, Euclidean, Manhattan, and other similarity functions, finding that Manhattan and Euclidean similarity often yield higher retrieval capacity and robustness than dot-product-based modern continuous Hopfield networks on MNIST, CIFAR10, and Tiny ImageNet.

Memory Architecture

Hierarchical Associative Memory

Dmitry Krotov

arXiv 2021 · 2021

Hierarchical Associative Memory organizes fully recurrent Modern Hopfield Networks into layered architectures using Lagrangian functions, hierarchical time scales, and symmetric feedforward–feedback weights as core components. Hierarchical Associative Memory theoretically extends Dense Associative Memories to arbitrary depth and local connectivity, deriving explicit dynamical and energy functions rather than reporting benchmark numbers.

Memory Architecture

Emergent Symbols through Binding in External Memory

Taylor W. Webb, Ishan Sinha, Jonathan D. Cohen

arXiv 2020 · 2020

Emergent Symbol Binding Network (ESBN) combines an LSTM controller, a shared image encoder fe, temporal context normalization, and a two column key value memory to bind abstract variables to concrete image embeddings. ESBN achieves ≥95% test accuracy on same different, RMTS, distribution of three, and identity rules tasks while generalizing to withheld Unicode characters, unlike LSTM, NTM, MNM, Relation Net, Transformer, and PrediNet baselines.

Memory Architecture

Large Associative Memory Problem in Neurobiology and Machine Learning

Dmitry Krotov, John Hopfield

arXiv 2020 · 2020

Large Associative Memory Problem rewrites associative memory using coupled feature neurons, memory neurons, an energy function, and Lagrangian functions so that all interactions are pairwise yet recover Dense and modern Hopfield behavior. Large Associative Memory Problem shows that with appropriate choices of activation and Lagrangian functions, the effective dynamics match Dense Associative Memories and modern Hopfield networks, enabling storage of N_mem ∼ min(N_f^{n−1}, N_h) or even exponential in N_f memories without many‑body synapses.

Memory Architecture

MEMO: A Deep Network for Flexible Combination of Episodic Memories

Andrea Banino, Adrià Puigdomènech Badia et al.

· 2020

MEMO combines common embeddings, multi head keys and values, recurrent attention, and a halting policy to flexibly chain episodic memories over multiple hops. On joint bAbI 10k, MEMO achieves 0.21% error versus 4.2% for Memory Networks, while also solving long-distance Paired Associative Inference and shortest path tasks.

Memory Architecture

Self-Attentive Associative Memory

Hung Le, Truyen Tran, Svetha Venkatesh

· 2020

Self-Attentive Associative Memory (STM) combines Outer Product Attention (OPA), Self-attentive Associative Memory (SAM), Mi-Write, Mr-Read, and Mr-Transfer into a dual item–relational memory system. On the bAbI question answering benchmark, STM attains 0.39 ± 0.18 mean error versus 0.55 ± 0.74 for MNM-p, establishing a new state-of-the-art.

Memory Architecture

Adaptive Posterior Learning: few-shot learning with a surprise-based memory module

Tiago Ramalho, Marta Garnelo

arXiv 2019 · 2019

Adaptive Posterior Learning combines an Encoder, Memory store, Memory controller, and Decoder (relational self-attention, relational working memory, or LSTM) to approximate posteriors from a sparse external memory. On Omniglot, Adaptive Posterior Learning achieves 99.9% 5-way 5-shot accuracy, matching MAML and SNAIL, while using fewer than 2 stored examples per class.

Memory Architecture

Progressive Memory Banks for Incremental Domain Adaptation

Nabiha Asghar, Lili Mou et al.

arXiv 2018 · 2018

Progressive Memory Banks for Incremental Domain Adaptation augments a BiLSTM with a directly parameterized memory bank, key value memory, and an attention based memory mechanism that is progressively expanded during incremental domain adaptation. On MultiNLI Fic→Gov, Progressive Memory Banks for Incremental Domain Adaptation with memory and vocabulary expansion reaches 67.55% on Fic and 70.82% on Gov, compared to 65.62% and 69.90% for fine tuning with no memory expansion.

Memory Architecture

Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory

Hao Zhou, Minlie Huang et al.

arXiv 2017 · 2017

Emotional Chatting Machine (ECM) augments a GRU encoder decoder with Emotion Category Embedding, Internal Memory, and External Memory to control emotional content in generated replies. On the Emotional STC dataset, ECM reaches 0.773 emotion accuracy vs 0.724 for Emb and 0.179 for Seq2Seq, while keeping perplexity comparable.