RAG

Research on retrieval-augmented generation (RAG) and non-parametric memory for language models.

33 papers

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

RAG

A Dynamic Retrieval-Augmented Generation System with Selective Memory and Remembrance

Okan Bursa

· 2026

Adaptive RAG Memory (ARM) augments a standard retriever–generator stack with a Dynamic Embedding Layer and Remembrance Engine that track usage statistics and apply selective remembrance and decay to embeddings. On a lightweight retrieval benchmark, ARM achieves NDCG@5 ≈ 0.9401 and Recall@5 = 1.000 with 22M parameters, matching larger baselines like gte-small while providing the best efficiency among ultra-efficient models.

arXiv:2601.02428 Read explainer

RAGLong-Term Memory

HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues

Yijie Zhong, Yunfan Gao, Haofen Wang

· 2026

HingeMem combines Boundary Guided Long-Term Memory, Dialogue Boundary Extraction, Memory Construction, Query Adaptive Retrieval, Hyperedge Rerank, and Adaptive Stop to segment dialogues into element-indexed hyperedges and plan query-specific retrieval. On LOCOMO, HingeMem achieves 63.9 overall F1 and 75.1 LLM-as-a-Judge score, surpassing the best baseline Zep (56.9 F1) by 7.0 F1 without using category-specific QA formats.

arXiv:2604.06845 Read explainer

BenchmarkRAG

Knowledge Packs: Zero-Token Knowledge Delivery via KV Cache Injection

Andrey Pustovit

· 2026

Knowledge Packs pre-compute KV Cache Injection, KV–Prefix Equivalence, Banked Routing, and KV Composition to deliver retrieved knowledge and steering via KV states instead of prompt tokens. On HotpotQA, Knowledge Packs’ KV-chat matches RAG at 65.2% EM on Qwen3-8B with 0/500 divergences while eliminating 284 tokens of retrieval text per query.

arXiv:2604.03270 Read explainer

RAGBenchmarkLong-Term Memory

MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents

Shu Wang, Edwin Yu et al.

· 2026

MemMachine combines Short-term memory, Long-term memory, Profile memory, and the Retrieval Agent to store raw conversational episodes and retrieve clustered context around nucleus matches. On LoCoMo, MemMachine scores 0.9169 with gpt-4.1-mini while using about 80% fewer input tokens than Mem0, and reaches 93.0% on LongMemEvalS with GPT-5-mini.

arXiv:2604.04853 Read explainer

RAGAgent MemoryLong-Term MemoryMemory Architecture

Memory as Metabolism: A Design for Companion Knowledge Systems

Stefan Miteski

· 2026

Memory as Metabolism defines companion knowledge systems with five retention operations (TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, AUDIT) plus memory gravity and minority-hypothesis retention over a raw buffer, active wiki, and cold memory. Instead of benchmark gains, Memory as Metabolism’s main result is a governance specification that separates descriptive, taxonomic, and normative claims and predicts improved coherence stability, fragility resistance, monoculture resistance, and effective minority-hypothesis influence for companion wikis.

arXiv:2604.12034 Read explainer

SurveyRAGAgent Memory

Memory for Autonomous LLM Agents:Mechanisms, Evaluation, and Emerging Frontiers

Pengfei Du

· 2026

Memory for Autonomous LLM Agents decomposes agent memory into a POMDP-grounded write–manage–read loop, a three-dimensional taxonomy, and five mechanism families spanning context compression, retrieval stores, reflection, hierarchical virtual context, and policy-learned management. Memory for Autonomous LLM Agents synthesizes results like Voyager’s 15.3× tech-tree speedup and MemoryArena’s 80%→45% drop to show that memory architecture often matters more than backbone choice.

arXiv:2603.07670 Read explainer

RAG

Self-Correcting RAG: Enhancing Faithfulness via MMKP Context Selection and NLI-Guided MCTS

Shijia Xu, Zhou Wu et al.

· 2026

Self-Correcting RAG combines a MMKP-based Context Selector, NLI-Guided MCTS Generator, and Self-Correcting RAG Optimizer to jointly optimize retrieval and reasoning under token and redundancy constraints. On six QA datasets, Self-Correcting RAG attains average EM 37.1 and F1 45.8, beating the strongest baseline CRAG (avg EM 34.3, F1 43.3).

arXiv:2604.10734 Read explainer

RAG

SE-Search: Self-Evolving Search Agent via Memory and Dense Reward

Jian Li, Yizhang Jin et al.

· 2026

SE-Search combines Memory Purification, Atomic Query, and Dense Rewards inside a GRPO-trained search agent that follows a Think-Search-Memorize-Answer loop. On seven QA benchmarks, SE-Search-3B attains 0.420 average EM, a 10.8 point absolute and 33.8% relative gain over Search-R1-Base.

arXiv:2603.03293 Read explainer

RAGBenchmarkBenchmarkBenchmarkAgent MemoryLong-Term MemoryMemory Architecture

Evaluating Long-Term Memory for Long-Context Question Answering

Alessandra Terranova, Björn Ross, Alexandra Birch

· 2025

Evaluating Long-Term Memory for Long-Context Question Answering compares Full Context, RAG, A-Mem, RAG+PromptOpt, and RAG+EpMem memory components across semantic, episodic, and procedural memory for long conversational QA. On LoCoMo, RAG+EpMem reaches an average F1 ranking of 1.83 for Llama 3.2-3B Instruct and 1.80 for GPT-4o mini while using around 1,000 tokens per query versus over 23,000 for Full Context.

arXiv:2510.23730 Read explainer

PickRAGBenchmark

From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Bernal Jiménez Gutiérrez, Yiheng Shu et al.

ICML 2025 · 2025

HippoRAG 2 combines Offline Indexing, a schema-less Knowledge Graph, Dense-Sparse Integration, Deeper Contextualization, and Recognition Memory into a neuro-inspired non-parametric memory system for LLMs. On the joint RAG benchmark suite, HippoRAG 2 achieves 59.8 average F1 versus 57.0 for NV-Embed-v2, including 71.0 F1 on 2Wiki compared to 61.5 for NV-Embed-v2.

arXiv:2502.14802 Code Read explainer

RAGBenchmarkBenchmarkMemory Architecture

Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

Chulun Zhou, Chunkang Zhang et al.

· 2025

HGMEM represents working memory as a hypergraph with Hypergraph-based Memory Storage, Adaptive Memory-based Evidence Retrieval, and Dynamic Memory Evolving to build high-order correlations across entities and facts. On Prelude long narrative understanding, HGMEM with GPT-4o achieves 73.81% accuracy compared to 72.22% for HippoRAG v2, while also reaching 69.74 comprehensiveness on Longbench generative sense-making QA.

arXiv:2512.23959 Read explainer

RAGBenchmarkBenchmarkMemory Architecture

Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation

Jackson Hassell, Dan Zhang et al.

· 2025

Learning from Supervision with Semantic and Episodic Memory combines a performance agent, critic agent, semantic memory, episodic memory, and memory retriever to turn label-grounded critiques into reusable supervision without parameter updates. On the Multi-Condition Ranking dataset with Mixtral 8x22B and o4-mini as critic, Learning from Supervision with Semantic and Episodic Memory reaches 85.6% accuracy, a 24.8% gain over the EP_LABEL baseline at 60.8%.

arXiv:2510.19897 Read explainer

RAG

LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics

Marc Glocker, Peter Hönig et al.

· 2025

LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics coordinates a routing agent, task planning agent, and knowledge base agent over RAG and ChromaDB to translate household commands into grounded robot actions. In three tabletop scenarios, Qwen2.5-32B in LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics achieves 84.3% total lenient task planning accuracy versus 68.7% for Gemma2-27B and 61.1% for LLaMa3.1-8B.

arXiv:2504.21716 Read explainer

RAG

MemInsight: Autonomous Memory Augmentation for LLM Agents

Rana Salama, Jason Cai et al.

· 2025

MemInsight augments agent memory using Attribute Mining, Annotation and Attribute Prioritization, and Memory Retrieval modules that generate and exploit structured attributes over past interactions. On the LoCoMo question answering benchmark, MemInsight with Claude-3-Sonnet priority augmentation achieves 60.5% Recall@5 versus 26.5% for DPR, a 34.0-point improvement.

arXiv:2503.21760 Read explainer

RAGBenchmarkBenchmarkMemory Architecture

Memory-Augmented Log Analysis with Phi-4-mini: Enhancing Threat Detection in Structured Security Logs

Anbi Guo, Mahfuza Farooque

· 2025

DM-RAG augments Phi-4-mini with a Short-Term Memory (STM) buffer, Long-Term Memory (LTM) FAISS store, Bayesian fusion, and a logistic regression confidence model for structured log analysis. On UNSW-NB15, DM-RAG reaches 98.70% recall and 69.59% F1, beating the Phi-4 + RAG (MITRE) baseline in F1 by 17.89 points.

arXiv:2510.00529 Read explainer

RAGBenchmarkMemory Architecture

Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

Jiaqi Cao, Jiarui Wang et al.

· 2025

Memory Decoder combines a Pre-training stage that aligns with kNN-LM distributions and an Inference interpolation mechanism that mixes Memory Decoder and base LLM outputs without changing base parameters. On Wikitext-103, Memory Decoder with 124M parameters reaches 13.36 perplexity on GPT2-small versus 14.76 for DAPT, and on specialized domains a single 0.5B Memory Decoder reduces average perplexity from 14.88 to 4.05 on Qwen2-0.5B.

arXiv:2508.09874 Read explainer

RAG

Memory-enhanced Retrieval Augmentation for Long Video Understanding

Huaying Yuan, Zheng Liu et al.

· 2025

MemVid combines a memory model, memorizer, retriever, and generator so a reasoning-oriented KV-cache memory can produce task-specific clues that drive video moment retrieval. On VideoMME (with subtitles, Avg column), MemVid scores 65.7% compared to 61.0% for Video-XL-7B, a +4.7 point gain under the same 7B scale.

arXiv:2503.09149 Read explainer

RAGBenchmarkAgent Memory

Memory in the Age of AI Agents

Yuyang Hu, Shichun Liu et al.

· 2025

Memory in the Age of AI Agents formalizes agent memory with Memory Formation, Memory Evolution, and Memory Retrieval operators, and classifies memories into token-level, parametric, and latent forms plus factual, experiential, and working functions. Memory in the Age of AI Agents’ main result is a unified Forms–Functions–Dynamics framework that consolidates fragmented LLM agent memory work, benchmarks, and open-source frameworks into a coherent taxonomy.

arXiv:2512.13564 Read explainer

BenchmarkRAG

MEPIC: Memory Efficient Position Independent Caching for LLM Serving

Qian Wang, Zahra Yousefijamarani et al.

· 2025

MEPIC extends vLLM with a Chunk Cache Coordinator, Chunk Matcher, Hybrid KV Manager, Chunk LRU Manager, and Chunk Processor to manage canonical, page-aligned, position-independent KV chunks in HBM. On long-context workloads, MEPIC reduces HBM usage by up to 5.21× and lowers latency by up to 11.48% compared to CacheBlend on Mistral-7B-Instruct-v0.3.

arXiv:2512.16822 Read explainer

RAG

MeVe: A Modular System for Memory Verification and Effective Context Control in Language Models

Andreas Ottem

· 2025

MeVe decomposes retrieval into Initial Retrieval, Relevance Verification, Fallback Retrieval, Context Prioritization, and Token Budgeting to tightly control what enters the LLM context. On a Wikipedia subset and HotpotQA, MeVe reduces average context from 188.8 to 79.8 tokens and from 308.6 to 78.5 tokens respectively compared to Standard RAG while keeping retrieval time comparable.

arXiv:2509.01514 Read explainer

RAGLong-Term MemoryMemory Architecture

Mnemosyne: An Unsupervised, Human-Inspired Long-Term Memory Architecture for Edge-Based LLMs

Aneesh Jonelagadda, Christina Hahn et al.

· 2025

Mnemosyne combines a Commitment pipeline with substance and redundancy filters, a probabilistic Recall traversal over a graph-structured store, asynchronous Core Summary updates, and a Pruning module to manage long-term memory on edge devices. On the LoCoMo benchmark, Mnemosyne reaches 60.42% temporal reasoning J-score and a 54.55% overall J-score, compared to 51.55% temporal reasoning and 62.74% overall for Memory-R1, and achieves a 65.8% win rate over a 31.07% naive RAG baseline in human evaluations.

arXiv:2510.08601 Read explainer

RAG

MobileRAG: A Fast, Memory-Efficient, and Energy-Efficient Method for On-Device RAG

Taehwan Park, Geonho Lee, Min-Soo Kim

· 2025

MobileRAG builds a fully on-device RAG stack by integrating EcoVector, Selective Content Reduction, DB Construction, and Chat Application over a local SQLite store. On SQuAD with Qwen2.5 1.5B, MobileRAG reaches 65.1% accuracy while reducing TTFT from 11.78s for Advanced RAG to 7.41s and cutting power from 89.09J to 56.28J.

arXiv:2507.01079 Read explainer

RAGBenchmarkBenchmarkLong-Term Memory

RGMem: Renormalization Group-inspired Memory Evolution for Language Agents

Ao Tian, Yunfeng Lu et al.

· 2025

RGMem builds a multi-scale memory state using Microscopic Evidence Space DL0, Structured Knowledge Space G, and renormalization operators RK1, RK2, RK3 to evolve user profiles. On PersonaMem with GPT-4.1, RGMem reaches 74.01% Avg., beating Memory OS by 8.98 points.

arXiv:2510.16392 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

Semantic Anchoring in Agentic Memory: Leveraging Linguistic Structures for Persistent Conversational Context

Maitreyi Chatterjee, Devansh Agarwal

· 2025

Semantic Anchoring enriches conversational memory by combining a hybrid memory store with dense and symbolic indexes, structured memory representation tuples, hybrid storage and indexing, and a retrieval scoring method. On MultiWOZ-Long, Semantic Anchoring reaches 83.5% Factual Recall and 80.8% Discourse Coherence, beating Entity-RAG by 7.6 and 8.6 points respectively.

arXiv:2508.12630 Read explainer

RAGMemory Architecture

TeleMem: Building Long-Term and Multimodal Memory for Agentic AI

Chunliang Chen, Ming Guan et al.

· 2025

TeleMem converts interactions into unified semantic nodes via the representation layer, organizes them in a memory graph with Insert and ReInsert, and reads them using closure-based retrieval and a ReAct-style multimodal agent. On ZH-4O, TeleMem reaches 86.33% QA Accuracy, beating the Mem0 baseline at 70.20% and the RAG baseline at 62.45%.

arXiv:2601.06037 Read explainer

RAG

Understanding Users' Privacy Perceptions Towards LLM's RAG-based Memory

Shuning Zhang, Rongjun Ma et al.

· 2025

Understanding Users' Privacy Perceptions Towards LLM's RAG-based Memory analyzes users' mental models, privacy calculus, and expectations around RAG-based memory across generation, management, usage, and updating. Understanding Users' Privacy Perceptions Towards LLM's RAG-based Memory finds users demand explicit consent, fine-grained editing and deletion, and visibility into inferred information to trust RAG-based memory systems.

arXiv:2508.07664 Read explainer

RAG

Crafting Personalized Agents through Retrieval-Augmented Generation on Editable Memory Graphs

Zheng Wang, Zhongyang Li et al.

· 2024

EMG-RAG combines Editable Memory Graphs, a two-stage MDP for Selecting Memories on EMGs, and Data Collection from real smartphone assistants to support editable, graph-structured personal memories. On a 2,500-user business dataset, EMG-RAG with GPT-4 reaches 75.99 BLEU on question answering, a +11.83 gain over M-RAG.

arXiv:2409.19401 Read explainer

RAG

Deciphering the Interplay of Parametric and Non-parametric Memory in Retrieval-augmented Language Models

Mehrdad Farahani, Richard Johansson

· 2024

Deciphering the Interplay of Parametric and Non-parametric Memory instruments causal mediation analysis, Experiment 1, Experiment 2, and Path Specific Effects (PSE) inside ATLAS to trace how parametric and non-parametric memories compete token-by-token. Deciphering the Interplay of Parametric and Non-parametric Memory reports a strong shift toward counterfactual answers in altered contexts, with a t-test p-value of 1.60e-4 and Cohen’s d of -0.9851 for non-parametric versus parametric behavior.

arXiv:2410.05162 Read explainer

RAG

Embodied-RAG: General Non-parametric Embodied Memory for Retrieval and Generation

Quanting Xie, So Yeon Min et al.

· 2024

Embodied-RAG builds a multimodal Topological Map and a hierarchical Semantic Forest and then runs Top-down Retrieval with LLM-based selection and hybrid re-ranking to drive Generation of waypoints and explanations. On the E-multimodal Embodied-Experiences dataset, Embodied-RAG reaches P(Q|A)=0.67 for implicit queries (Q only), compared to 0.13 for LightRAG, while building graph memory 9.76× faster than LightRAG.

arXiv:2409.18313 Read explainer

RAG

"Ghost of the past": identifying and resolving privacy leakage from LLM's memory through proactive user interaction

Shuning Zhang, Lyumanshan Ye et al.

· 2024

MemoAnalyzer analyzes past inputs and long-term memories using prompt-based privacy inference, confidence and sensitivity visualization, and source tracking with an editing proxy. In a 5-day study on work, life, and academic tasks, MemoAnalyzer reduced total inferred private information by 22.3% compared to GPT memory while keeping completion time comparable to GPT and Manual baselines.

arXiv:2410.14931 Read explainer

RAG

Retrieval-Augmented Decision Transformer: External Memory for In-context RL

Thomas Schmied, Fabian Paischer et al.

· 2024

Retrieval-Augmented Decision Transformer (RA-DT) combines a vector index, embedding model g(·), maximum inner product search, experience reweighting, and cross-attention layers to retrieve and fuse relevant sub-trajectories into a Decision Transformer policy. On Dark-Room 10×10, RA-DT reaches near-optimal average reward over 40 in-context trials while using a 50-step context window, whereas baselines like Algorithm Distillation require entire episodes of up to 100 steps.

arXiv:2410.07071 Read explainer

RAG

Toward Conversational Agents with Context and Time Sensitive Long-term Memory

Nick Alonso, Tomás Figliolia et al.

· 2024

Toward Conversational Agents with Context and Time Sensitive Long-term Memory integrates a Tabular Chat Database, Classifying Query Type, Chain-of-Tables for Meta-Data Retrieval, and Combining Meta-Data and Semantic Retrieval to handle time-sensitive and ambiguous conversational queries. On the LoCoMo-derived temporal benchmark, Toward Conversational Agents with Context and Time Sensitive Long-term Memory achieves 90.32 average recall vs 31.93 for the best Semantic w MetaD baseline.

arXiv:2406.00057 Read explainer