Category

Semantic Memory

Semantic memory for LLMs — structured knowledge, fact storage, and meaning-based retrieval.

6 papers

RAGBenchmarkBenchmarkBenchmarkAgent MemoryLong-Term MemoryMemory Architecture

Evaluating Long-Term Memory for Long-Context Question Answering

Alessandra Terranova, Björn Ross, Alexandra Birch

· 2025

Evaluating Long-Term Memory for Long-Context Question Answering compares Full Context, RAG, A-Mem, RAG+PromptOpt, and RAG+EpMem memory components across semantic, episodic, and procedural memory for long conversational QA. On LoCoMo, RAG+EpMem reaches an average F1 ranking of 1.83 for Llama 3.2-3B Instruct and 1.80 for GPT-4o mini while using around 1,000 tokens per query versus over 23,000 for Full Context.

RAGBenchmarkBenchmarkMemory Architecture

Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation

Jackson Hassell, Dan Zhang et al.

· 2025

Learning from Supervision with Semantic and Episodic Memory combines a performance agent, critic agent, semantic memory, episodic memory, and memory retriever to turn label-grounded critiques into reusable supervision without parameter updates. On the Multi-Condition Ranking dataset with Mixtral 8x22B and o4-mini as critic, Learning from Supervision with Semantic and Episodic Memory reaches 85.6% accuracy, a 24.8% gain over the EP_LABEL baseline at 60.8%.

RAGBenchmarkAgent MemoryMemory Architecture

Semantic Anchoring in Agentic Memory: Leveraging Linguistic Structures for Persistent Conversational Context

Maitreyi Chatterjee, Devansh Agarwal

· 2025

Semantic Anchoring enriches conversational memory by combining a hybrid memory store with dense and symbolic indexes, structured memory representation tuples, hybrid storage and indexing, and a retrieval scoring method. On MultiWOZ-Long, Semantic Anchoring reaches 83.5% Factual Recall and 80.8% Discourse Coherence, beating Entity-RAG by 7.6 and 8.6 points respectively.

BenchmarkBenchmarkMemory Architecture

WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

Woongyeong Yeo, Kangsan Kim et al.

· 2025

WorldMM dynamically coordinates Episodic Memory, Semantic Memory, Visual Memory, an Adaptive Retrieval Agent, and a Response Agent to answer queries over hour- to week-long videos. On five long video QA benchmarks, WorldMM-GPT reaches 69.5% average accuracy, beating M3-Agent’s 55.1% by 14.4 points and the best prior memory baseline HippoRAG’s 57.0% by 12.5 points.

Benchmark

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Peiqi Liu, Zhanqiu Guo et al.

· 2024

DynaMem maintains a Dynamic 3D Voxel Map, supports Embedded Vision Language Features and Multimodal Large Language Models querying, and exposes Exploration Primitives and an Obstacle map for navigation and manipulation. On real Stretch SE3 experiments, DynaMem achieves a 70% success rate on dynamic pick-and-drop tasks compared to 30% for the static OK-Robot baseline.

Benchmark

Learning to Learn Variational Semantic Memory

Xiantong Zhen, Yingjun Du et al.

arXiv 2020 · 2020

Variational Semantic Memory combines variational prototype inference, variational semantic memory, latent memory m, and an attention-based memory update to build probabilistic class prototypes from long-term semantic knowledge. On miniImageNet 5-way 1-shot with a deep backbone, Variational Semantic Memory reaches 65.72% accuracy versus 64.82% for Tian et al. 2020.