Memory Architecture
Yanzhen Lu, Muchen Jiang et al.
· 2026
TAG routes low-confidence steps to uncertainty-based routing, filters them with guarded acceptance with rollback, chooses between bank selection across rule and exemplar memory, and prunes via evidence-based retirement inside a unified control loop. On SVAMP and ASDiv, TAG reaches 81.0% and 85.2% accuracy, improving over the 74.0% and 77.5% no-memory baselines while a compute-matched Retry baseline stays flat.
RAGBenchmarkAgent MemoryMemory Architecture
Xingyu Lyu, Jianfeng He et al.
· 2026
ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.
BenchmarkMemory Architecture
Shannan Yan, Jingchen Ni et al.
· 2026
AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.
Memory Architecture
Pratyay Banerjee, Masud Moshtaghi et al.
· 2026
APEX-MEM constructs a semi-structured temporal property graph using Ontology, Entity and Property Resolution, Fact Extraction, and Graph Agents to store and query conversational memory. On LOCOMO, APEX-MEM with GPT5 achieves 88.88% overall accuracy, beating MIRIX at 85.38% by 3.50 percentage points.
SurveyBenchmarkAgent MemoryLong-Term MemoryMemory Architecture
Zehao Lin, Chunyu Li, Kai Chen
· 2026
Mnemonic Sovereignty analyzes long term Write, Store, Retrieve, Execute, Share, and Forget Rollback phases against integrity, confidentiality, availability, and governance objectives for agent memory. Mnemonic Sovereignty’s lifecycle matrix shows most of the ~70 works cluster on write and retrieve integrity, leaving store, availability, and governance primitives like write gate validation and post deletion verification almost entirely unexplored.
Memory Architecture
Auxiliary-predicted Compress Memory Model (ApCM Model) combines an Invertible Dimensionality Reduction and Predictor (IDRP) module with a Memory Read-Write Controller, including a global Memory Bank, cosine-similarity read, and access-frequency write policy. ApCM Model achieves lower MSE (0.987171 vs 1.001440) than a Key-Value Memory Network while compressing memory from 1024 to 128 dimensions on random data.
Memory Architecture
Natchanon Pollertlam, Witchayut Kornsuwannawit
· 2026
Beyond the Context Window compares Conversation Segmentation, Fact Extraction, Embedding and Storage, and Retrieval Mechanism in a Mem0-based memory system against long-context GPT-5-mini. On LongMemEval, Beyond the Context Window finds LC GPT-5-mini reaches 82.40% accuracy, 33.4 percentage points above the memory system baseline.
BenchmarkMemory Architecture
Andy Nguyen, Danh Doan et al.
· 2026
ByteRover organizes memory through an LLM-driven Agent Layer, a sequential Execution Layer, and a file-based Context Tree with an Adaptive Knowledge Lifecycle. On LoCoMo, ByteRover reaches 96.1% overall accuracy versus 89.9% for HonCho, and on LongMemEval-S ByteRover scores 92.8% overall.
BenchmarkAgent MemoryMemory Architecture
Zhaofen Wu, Hanrong Zhang et al.
· 2026
GAM builds a Hierarchical Graph Memory Architecture with a global Topic Associative Network, local Event Progression Graphs, State-Based Memory Consolidation, and Graph-Guided Multi-Factor Retrieval to decouple encoding from consolidation. On LoCoMo with Qwen2.5-7B, GAM attains an Average F1 of 40.00 compared to Mem0’s 35.38, and on LongDialQA with Qwen2.5-7B, GAM reaches 12.55 F1 vs MemoryOS at 6.76.
BenchmarkAgent MemoryLong-Term MemoryMemory Architecture
Jiaquan Zhang, Chaoning Zhang et al.
· 2026
LightMem orchestrates SLM-1 Controller, SLM-2 Selector, SLM-3 Writer, and STM MTM LTM stores to modularize retrieval, writing, and offline consolidation. On LoCoMo, LightMem reaches 34.50 F1 for GPT-4o multi hop questions, +1.64 over A-MEM, while keeping median retrieval latency at 83 ms.
Agent MemoryMemory Architecture
Ziliang Guo, Ziheng Li et al.
· 2026
MemFactory decomposes memory agents into Module Layer, Agent Layer, Environment Layer, and Trainer Layer with plug and play Extractor, Updater, Retriever, and RecurrentMemoryModule components. On MemAgent eval_50, MemFactory raises Qwen3-1.7B from 0.4727 to 0.5684 and Qwen3-4B-Instruct from 0.6523 to 0.7051 using GRPO.
RAGAgent MemoryLong-Term MemoryMemory Architecture
Memory as Metabolism defines companion knowledge systems with five retention operations (TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, AUDIT) plus memory gravity and minority-hypothesis retention over a raw buffer, active wiki, and cold memory. Instead of benchmark gains, Memory as Metabolism’s main result is a governance specification that separates descriptive, taxonomic, and normative claims and predicts improved coherence stability, fragility resistance, monoculture resistance, and effective minority-hypothesis influence for companion wikis.
Memory Architecture
Minki Kang, Wei-Ning Chen et al.
arXiv 2025 · 2025
ACON combines History Compression, Observation Compression, Compression Guideline Optimization, and Compressor Distillation to rewrite agent histories and observations into concise, task-aware summaries. On AppWorld, ACON UTCO with gpt-4.1 achieves 56.5% accuracy with 7.33k peak tokens, versus 56.0% accuracy with 9.93k peak tokens for No compression.
RAGBenchmarkBenchmarkBenchmarkAgent MemoryLong-Term MemoryMemory Architecture
Alessandra Terranova, Björn Ross, Alexandra Birch
· 2025
Evaluating Long-Term Memory for Long-Context Question Answering compares Full Context, RAG, A-Mem, RAG+PromptOpt, and RAG+EpMem memory components across semantic, episodic, and procedural memory for long conversational QA. On LoCoMo, RAG+EpMem reaches an average F1 ranking of 1.83 for Llama 3.2-3B Instruct and 1.80 for GPT-4o mini while using around 1,000 tokens per query versus over 23,000 for Full Context.
BenchmarkBenchmarkAgent MemoryMemory Architecture
MaRS organizes agent memory into episodic, semantic, social, and task nodes with provenance, scored by a privacy-aware retention controller and governed by FIFO, LRU, Priority Decay, Reflection-Summary, Random-Drop, and Hybrid policies. On the FiFA benchmark, the Hybrid policy in MaRS achieves a composite score of ≈0.911 across 300 runs and five memory budgets, outperforming simpler policies while preserving privacy and cost efficiency.
Agent MemoryMemory Architecture
Chris Latimer, Nicoló Boschi et al.
· 2025
HINDSIGHT organizes agent memory into four networks via TEMPR and layers CARA on top to retain, recall, and reflect with explicit opinions and behavioral profiles. On LongMemEval, HINDSIGHT with Gemini-3 Pro scores 91.4% overall versus 60.2% for full-context GPT-4o, while HINDSIGHT with OSS-20B jumps from 39.0% to 83.6% over a full-context OSS-20B baseline.
RAGBenchmarkBenchmarkMemory Architecture
Chulun Zhou, Chunkang Zhang et al.
· 2025
HGMEM represents working memory as a hypergraph with Hypergraph-based Memory Storage, Adaptive Memory-based Evidence Retrieval, and Dynamic Memory Evolving to build high-order correlations across entities and facts. On Prelude long narrative understanding, HGMEM with GPT-4o achieves 73.81% accuracy compared to 72.22% for HippoRAG v2, while also reaching 69.74 comprehensiveness on Longbench generative sense-making QA.
RAGBenchmarkBenchmarkMemory Architecture
Jackson Hassell, Dan Zhang et al.
· 2025
Learning from Supervision with Semantic and Episodic Memory combines a performance agent, critic agent, semantic memory, episodic memory, and memory retriever to turn label-grounded critiques into reusable supervision without parameter updates. On the Multi-Condition Ranking dataset with Mixtral 8x22B and o4-mini as critic, Learning from Supervision with Semantic and Episodic Memory reaches 85.6% accuracy, a 24.8% gain over the EP_LABEL baseline at 60.8%.
Agent MemoryLong-Term MemoryMemory Architecture
Zhengjun Huang, Zhoujin Tian et al.
· 2025
LiCoMemory organizes long term dialogue with CogniGraph, Query Processing and Integrated Rerank, and Real Time Interactions to keep session summaries, triples, and chunks linked. On LongMemEval with GPT-4o-mini, LiCoMemory reaches 73.80% accuracy and 76.63% recall, beating Mem0g by 9.0 and 7.1 points.
BenchmarkLong-Term MemoryMemory Architecture
Jizhan Fang, Xinle Deng et al.
· 2025
LightMem pipelines a Cognitive-Inspired Sensory Memory, Topic Segmentation Submodule, Topic-Aware Short-Term Memory, and Long-Term Memory with Sleep-Time Update to filter, group, summarize, and asynchronously consolidate dialogue history. On LongMemEval-S with Qwen3-30B-A3B-Instruct-2507, LightMem reaches 70.20% ACC vs 65.20% for A-MEM (+5.00 points) while reducing total token usage by up to 21.8× and API calls by up to 17.1×.
BenchmarkAgent MemoryMemory Architecture
Guibin Zhang, Haotian Ren et al.
· 2025
MemEvolve decomposes agent memory into Encode, Store, Retrieve, and Manage modules and meta evolves these components via a dual evolution process over candidate architectures. On xBench DeepSearch, MemEvolve with GPT 5 mini raises Flash Searcher pass@1 from 69.0 to 74.0 and WebWalkerQA accuracy from 58.82 to 61.18 while keeping API cost near 0.141 per query.
Memory Architecture
Bao Pham, Gabriel Raya et al.
arXiv 2025 · 2025
Memorization to Generalization recasts diffusion training and sampling as Dense Associative Memory dynamics, analyzing memorized, spurious, and generalized states via energy basins and curvature. Memorization to Generalization shows that as training size grows on MNIST, CIFAR10, FASHION-MNIST, LSUN-CHURCH, and Stable Diffusion, spurious states peak at the memorization–generalization boundary and have distinct basin volume and curvature signatures.
RAGBenchmarkBenchmarkMemory Architecture
Anbi Guo, Mahfuza Farooque
· 2025
DM-RAG augments Phi-4-mini with a Short-Term Memory (STM) buffer, Long-Term Memory (LTM) FAISS store, Bayesian fusion, and a logistic regression confidence model for structured log analysis. On UNSW-NB15, DM-RAG reaches 98.70% recall and 69.59% F1, beating the Phi-4 + RAG (MITRE) baseline in F1 by 17.89 points.
SurveyCognitive ArchitectureMemory Architecture
Parsa Omidi, Xingshuai Huang et al.
arXiv 2025 · 2025
Memory-Augmented Transformers organizes functional objectives, memory types, and integration techniques into a unified taxonomy that connects biological memory principles with concrete architectures like Memformer, Titans, ATLAS, and EMAT. Memory-Augmented Transformers’ main result is a systematic three-dimensional classification that links dynamic multi-timescale memory, selective attention, and consolidation to specific Transformer designs and emerging lifelong-learning paradigms.
RAGBenchmarkMemory Architecture
Jiaqi Cao, Jiarui Wang et al.
· 2025
Memory Decoder combines a Pre-training stage that aligns with kNN-LM distributions and an Inference interpolation mechanism that mixes Memory Decoder and base LLM outputs without changing base parameters. On Wikitext-103, Memory Decoder with 124M parameters reaches 13.36 perplexity on GPT2-small versus 14.76 for DAPT, and on specialized domains a single 0.5B Memory Decoder reduces average perplexity from 14.88 to 4.05 on Qwen2-0.5B.
PickMemory Architecture
Zhiyu Li, Chenyang Xi et al.
arXiv 2025 · 2025
MemOS introduces MemCube, MemScheduler, MemOperator, and MemLifecycle to treat plaintext, activation, and parameter memories as first-class resources with unified APIs and governance. MemOS achieves state-of-the-art performance across PreFEval, PersonaMem, LongMemEval, and LoCoMo compared to MIRIX, Mem0, Zep, Memobase, MemU, and Supermemory, though exact benchmark scores are only summarized qualitatively in Figure 1.
BenchmarkMemory Architecture
MMAG organizes conversational memory, long-term user memory, episodic and event-linked memories, sensory and context-aware memory, and short-term working memory under a modular memory controller integrated with Heero’s encrypted Firestore and S3 stores. MMAG delivers a 20% increase in user retention and a 30% increase in average conversation duration on the Heero language learning platform compared to its pre-memory deployment.
RAGLong-Term MemoryMemory Architecture
Aneesh Jonelagadda, Christina Hahn et al.
· 2025
Mnemosyne combines a Commitment pipeline with substance and redundancy filters, a probabilistic Recall traversal over a graph-structured store, asynchronous Core Summary updates, and a Pruning module to manage long-term memory on edge devices. On the LoCoMo benchmark, Mnemosyne reaches 60.42% temporal reasoning J-score and a 54.55% overall J-score, compared to 51.55% temporal reasoning and 62.74% overall for Memory-R1, and achieves a 65.8% win rate over a 31.07% naive RAG baseline in human evaluations.
Memory Architecture
Shuche Wang, Fengzhuo Zhang et al.
· 2025
Muon Outperforms Adam in Tail-End Associative Memory Learning analyzes how VO attention weights, FFN matrices, normalized SVD entropy, and effective rank behave under Muon versus Adam in transformer associative memories. Muon Outperforms Adam in Tail-End Associative Memory Learning finds that applying Muon to VO and FFN nearly recovers full-Muon validation loss (3.5654 vs 3.9242 for All Adam at 10k steps on FineWeb) while improving tail-class accuracy on a heavy-tailed QA task compared to Adam and SGD+Momentum.
RAGBenchmarkAgent MemoryMemory Architecture
Maitreyi Chatterjee, Devansh Agarwal
· 2025
Semantic Anchoring enriches conversational memory by combining a hybrid memory store with dense and symbolic indexes, structured memory representation tuples, hybrid storage and indexing, and a retrieval scoring method. On MultiWOZ-Long, Semantic Anchoring reaches 83.5% Factual Recall and 80.8% Discourse Coherence, beating Entity-RAG by 7.6 and 8.6 points respectively.
RAGMemory Architecture
Chunliang Chen, Ming Guan et al.
· 2025
TeleMem converts interactions into unified semantic nodes via the representation layer, organizes them in a memory graph with Insert and ReInsert, and reads them using closure-based retrieval and a ReAct-style multimodal agent. On ZH-4O, TeleMem reaches 86.33% QA Accuracy, beating the Mem0 baseline at 70.20% and the RAG baseline at 62.45%.
Memory Architecture
Ke Alexander Wang, Jiaxin Shi, Emily B. Fox
· 2025
Test-time regression uses memorization as regression, memory retrieval, and test-time regression layers to reinterpret sequence architectures as solving a regression problem over key value pairs during the forward pass. This unification shows how linear attention, state space models, fast weight programmers, online learning layers, and softmax attention are all instances of the same framework and explains phenomena like linear attention’s failures and the role of query key normalization.
PickMemory Architecture
Ali Behrouz, Peilin Zhong, Vahab Mirrokni
arXiv 2025 · 2025
Titans combines a Core short-term attention block, a deep Long-term Memory module, and Persistent Memory tokens, with three integration variants: Memory as a Context (MAC), Memory as a Gate (MAG), and Memory as a Layer (MAL). On language modeling and reasoning benchmarks, Titans (MAC) at 760M parameters achieves 52.51 average accuracy vs 51.49 for Gated DeltaNet-H2, while also solving BABILong tasks that defeat GPT-4.
Memory Architecture
Shu Zhong, Mingyu Xu et al.
· 2025
Understanding Transformer from the Perspective of Associative Memory reframes Softmax Attention, Linear Attention, FFN, and DeltaNet as instances of a unified associative memory with explicit memory capacity and update rules. Using this lens, Understanding Transformer from the Perspective of Associative Memory derives retrieval SNR for different kernels, unifies attention and FFNs, and proves that DeltaFormer achieves circuit complexity beyond TC0, reaching NC1 expressivity.
Agent MemoryMemory Architecture
Jiali Cheng, Anjishnu Kumar et al.
· 2025
WebATLAS combines a Planner, Actor, Critic, and Multi-layered Memory (Working Memory, Cognitive Map, Semantic Memory) to simulate and score actions before executing them on the web. On WebArena-Lite, WebATLAS achieves 63.0% average success versus 53.9% for Plan-and-Act, a +9.1 point gain without website-specific fine-tuning.
BenchmarkBenchmarkMemory Architecture
Woongyeong Yeo, Kangsan Kim et al.
· 2025
WorldMM dynamically coordinates Episodic Memory, Semantic Memory, Visual Memory, an Adaptive Retrieval Agent, and a Response Agent to answer queries over hour- to week-long videos. On five long video QA benchmarks, WorldMM-GPT reaches 69.5% average accuracy, beating M3-Agent’s 55.1% by 14.4 points and the best prior memory baseline HippoRAG’s 57.0% by 12.5 points.
Memory Architecture
Yibo Jiang, Goutham Rajendran et al.
· 2024
Do LLMs dream of elephants studies how a self-attention layer, value matrix, embedding matrix, latent concept association task, and context hijacking prompts interact to implement associative memory in transformers. Do LLMs dream of elephants proves theoretically (Theorem 1, Theorem 4) and empirically that a one-layer transformer can achieve arbitrarily small error on latent concept association by using the value matrix as associative memory.
Memory Architecture
Yabin Zhang, Wenjie Zhu et al.
arXiv 2024 · 2024
Dual Memory Networks combines a Dynamic Memory Network, Static Memory Network, a shared ReadOut module, Projection Layers ω, and a Memory Interactive Strategy to build sample-adaptive classifiers on top of frozen CLIP encoders. On zero-shot ImageNet with ViT-B/16, Dual Memory Networks achieves 72.25% accuracy vs 66.73% for CLIP and 68.98% for TPT, a +5.52 and +3.27 point gain respectively.
Memory Architecture
Xuechen Liang, Yangfan He et al.
· 2024
SAGE coordinates Iterative Feedback, Reflection, Short-Term Memory, Long-Term Memory, and MemorySyntax so the assistant, checker, and user co-evolve policies and memories over time. On AgentBench and long-context QA like HotpotQA, SAGE lifts GPT-3.5’s Database score from 25.9 to 37.6 and HotpotQA answer accuracy from 48.5% to 68.3%.
Memory Architecture
Eshaan Nichani, Jason D. Lee, Alberto Bietti
· 2024
Understanding Factual Recall in Transformers via Associative Memories analyzes linear associative memories, MLP associative memories, and a one-layer transformer with multi-head self-attention plus an MLP on a synthetic factual recall task. Understanding Factual Recall in Transformers via Associative Memories proves that storing N random associations requires Θ(N log M) bits and that a single-layer transformer achieves 100% accuracy whenever either self-attention or MLP parameters scale linearly with the number of facts.
Memory Architecture
Ryuhei Hamaguchi, Yasutaka Furukawa et al.
arXiv 2023 · 2023
Hierarchical Neural Memory Network (HMNet) stacks multi-level latent memories z1–z3 with Event-write, Up-write, Down-write, Update, and Readout operations driven by Event Sparse Cross Attention. On DSEC-Semantic, HMNet-L3 reaches 57.4 mIoU with event–RGB fusion, improving over a ResNet-50 baseline at 54.1 mIoU while also reducing latency, and on GEN1 HMNet-B1 matches AED at similar mAP with 57% lower latency.
Memory Architecture
Tian-Xing Xu, Yuan-Chen Guo et al.
arXiv 2023 · 2023
MBPTrack combines a Decoupling Feature Propagation Module, BPLocNet, box-prior sampling, and point-to-reference aggregation to track 3D objects from point clouds using temporal memory and size-aware localization. On KITTI, MBPTrack achieves 70.3% Success and 87.9% Precision, improving over CXTrack’s 67.5%/85.3% by +2.8/+2.6.
Memory Architecture
Manuele Barraco, Sara Sarto et al.
arXiv 2023 · 2023
PMA-Net augments Transformer self-attention with memory banks, prototype keys, and prototype values built via K-Means and k-NN over past activations. On COCO Karpathy test, PMA-Net reaches 131.5 CIDEr under cross-entropy training, a +3.7 gain over a re-trained Transformer baseline.
Memory Architecture
Jason Yoo, Frank Wood
arXiv 2022 · 2022
BayesPCN combines predictive coding, conjugate Bayesian updates over W 0:L, sequential importance sampling, and a diffusion-based forget mechanism to build a hierarchical associative memory that supports continual one-shot writes. On CIFAR10 and Tiny ImageNet hetero-associative tasks, BayesPCN matches offline GPCN with MSE as low as 0.0000 while online GPCN rises to 0.0791 MSE on CIFAR10 mask at sequence length 1024.
Memory Architecture
Beren Millidge, Tommaso Salvatori et al.
· 2022
Universal Hopfield Networks decompose single-shot associative memory into similarity, separation, and projection, instantiating Hopfield networks, sparse distributed memories, dense associative memories, and modern continuous Hopfield networks within one energy-based framework. Universal Hopfield Networks then compare dot product, Euclidean, Manhattan, and other similarity functions, finding that Manhattan and Euclidean similarity often yield higher retrieval capacity and robustness than dot-product-based modern continuous Hopfield networks on MNIST, CIFAR10, and Tiny ImageNet.
Memory Architecture
Dmitry Krotov
arXiv 2021 · 2021
Hierarchical Associative Memory organizes fully recurrent Modern Hopfield Networks into layered architectures using Lagrangian functions, hierarchical time scales, and symmetric feedforward–feedback weights as core components. Hierarchical Associative Memory theoretically extends Dense Associative Memories to arbitrary depth and local connectivity, deriving explicit dynamical and energy functions rather than reporting benchmark numbers.
Memory Architecture
Taylor W. Webb, Ishan Sinha, Jonathan D. Cohen
arXiv 2020 · 2020
Emergent Symbol Binding Network (ESBN) combines an LSTM controller, a shared image encoder fe, temporal context normalization, and a two column key value memory to bind abstract variables to concrete image embeddings. ESBN achieves ≥95% test accuracy on same different, RMTS, distribution of three, and identity rules tasks while generalizing to withheld Unicode characters, unlike LSTM, NTM, MNM, Relation Net, Transformer, and PrediNet baselines.
Memory Architecture
Dmitry Krotov, John Hopfield
arXiv 2020 · 2020
Large Associative Memory Problem rewrites associative memory using coupled feature neurons, memory neurons, an energy function, and Lagrangian functions so that all interactions are pairwise yet recover Dense and modern Hopfield behavior. Large Associative Memory Problem shows that with appropriate choices of activation and Lagrangian functions, the effective dynamics match Dense Associative Memories and modern Hopfield networks, enabling storage of N_mem ∼ min(N_f^{n−1}, N_h) or even exponential in N_f memories without many‑body synapses.
Memory Architecture
Andrea Banino, Adrià Puigdomènech Badia et al.
· 2020
MEMO combines common embeddings, multi head keys and values, recurrent attention, and a halting policy to flexibly chain episodic memories over multiple hops. On joint bAbI 10k, MEMO achieves 0.21% error versus 4.2% for Memory Networks, while also solving long-distance Paired Associative Inference and shortest path tasks.
Memory Architecture
Hung Le, Truyen Tran, Svetha Venkatesh
· 2020
Self-Attentive Associative Memory (STM) combines Outer Product Attention (OPA), Self-attentive Associative Memory (SAM), Mi-Write, Mr-Read, and Mr-Transfer into a dual item–relational memory system. On the bAbI question answering benchmark, STM attains 0.39 ± 0.18 mean error versus 0.55 ± 0.74 for MNM-p, establishing a new state-of-the-art.
Memory Architecture
Tiago Ramalho, Marta Garnelo
arXiv 2019 · 2019
Adaptive Posterior Learning combines an Encoder, Memory store, Memory controller, and Decoder (relational self-attention, relational working memory, or LSTM) to approximate posteriors from a sparse external memory. On Omniglot, Adaptive Posterior Learning achieves 99.9% 5-way 5-shot accuracy, matching MAML and SNAIL, while using fewer than 2 stored examples per class.
Memory Architecture
Nabiha Asghar, Lili Mou et al.
arXiv 2018 · 2018
Progressive Memory Banks for Incremental Domain Adaptation augments a BiLSTM with a directly parameterized memory bank, key value memory, and an attention based memory mechanism that is progressively expanded during incremental domain adaptation. On MultiNLI Fic→Gov, Progressive Memory Banks for Incremental Domain Adaptation with memory and vocabulary expansion reaches 67.55% on Fic and 70.82% on Gov, compared to 65.62% and 69.90% for fine tuning with no memory expansion.
Memory Architecture
Hao Zhou, Minlie Huang et al.
arXiv 2017 · 2017
Emotional Chatting Machine (ECM) augments a GRU encoder decoder with Emotion Category Embedding, Internal Memory, and External Memory to control emotional content in generated replies. On the Emotional STC dataset, ECM reaches 0.773 emotion accuracy vs 0.724 for Emb and 0.179 for Seq2Seq, while keeping perplexity comparable.
Memory Architecture
Baolin Peng, Kaisheng Yao
arXiv 2015 · 2015
RNN-EM augments a simple recurrent neural network with an external memory Mt, forget gate ft, update gate ut, and a cosine-similarity weight wt to read and write contextual representations. On the ATIS language understanding benchmark, RNN-EM attains 95.25% F1, beating the previous best LSTM result of 94.85% by 0.40 points.