Personalization

Personalized memory for AI assistants — user preference learning, persona modeling, and adaptive long-horizon dialogue.

10 papers

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

BenchmarkBenchmarkLong-Term Memory

A-MBER: Affective Memory Benchmark for Emotion Recognition

Deliang Wen, Ke Sun, Yu Wang

· 2026

A-MBER builds multi-session conversational scenarios via a staged pipeline of persona specification, long-horizon planning, conversation generation, annotation, question construction, and benchmark-unit packaging. On A-MBER, a structured memory system reaches 0.69 judgment accuracy, 0.66 retrieval, and 0.65 explanation versus 0.34, 0.29, and 0.31 for a no-memory baseline.

arXiv:2604.07017 Read explainer

BenchmarkBenchmarkLong-Term Memory

BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs

Sangyeon Yoon, Sunkyoung Kim et al.

· 2026

BenchPreS combines Contexts, User Profiles, Preference Attributes, Gold Labeling, and an LLM-as-Judge framework to test context-aware preference selectivity in persistent-memory LLMs. BenchPreS shows GPT-5.2 reaches 87.33% Appropriate Application Rate on BenchPreS while still having a 40.95% Misapplication Rate compared to Gemini 3 Pro’s 86.48% Misapplication Rate.

arXiv:2603.16557 Read explainer

RAGBenchmarkLong-Term Memory

MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents

Shu Wang, Edwin Yu et al.

· 2026

MemMachine combines Short-term memory, Long-term memory, Profile memory, and the Retrieval Agent to store raw conversational episodes and retrieve clustered context around nucleus matches. On LoCoMo, MemMachine scores 0.9169 with gpt-4.1-mini while using about 80% fewer input tokens than Mem0, and reaches 93.0% on LongMemEvalS with GPT-5-mini.

arXiv:2604.04853 Read explainer

BenchmarkBenchmarkAgent Memory

MemoryCD: Benchmarking Long-Context User Memory of LLM Agents for Lifelong Cross-Domain Personalization

Weizhi Zhang, Xiaokai Wei et al.

· 2026

MEMORYCD builds a user memory pool Mu from lifelong Amazon Review histories and evaluates long-context prompting, Mem0, LoCoMo, ReadAgent, MemoryBank, and A-Mem across rating, ranking, and personalized text tasks. On Books and Home & Kitchen, MEMORYCD shows GPT-5 reaches RMSE 0.551–0.624 and NDCG@3 up to 0.610, while Gemini-2.5 Pro peaks at ROUGE-L 0.222 for generation, revealing substantial remaining gaps to real user behavior.

arXiv:2603.25973 Read explainer

Benchmark

In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents

Zhen Tan, Jun Yan et al.

ACL 2025 · 2025

Reflective Memory Management (RMM) uses a memory bank, retriever, reranker, and LLM to implement Prospective Reflection and Retrospective Reflection for topic-based storage and RL-based retrieval refinement. On LongMemEval, RMM with GTE achieves 69.8% Recall@5 and 70.4% accuracy, compared to 62.4% Recall@5 and 63.6% accuracy for GTE RAG.

arXiv:2503.08026 Read explainer

BenchmarkBenchmarkAgent Memory

Memoria: A Scalable Agentic Memory Framework for Personalized Conversational AI

Samarth Sarin, Lovepreet Singh et al.

· 2025

Memoria augments LLM chats with structured conversation logging, dynamic user persona via KG, session level memory for real time context, and seamless retrieval for context aware responses to provide persistent, interpretable memory. On LongMemEvals single-session-user and knowledge-update subsets, Memoria reaches 87.1% and 80.8% accuracy respectively, surpassing A-Mem (OpenAI) while using much shorter prompts.

arXiv:2512.12686 Read explainer

BenchmarkAgent Memory

PersonaMem-v2: Towards Personalized Intelligence via Learning Implicit User Personas and Agentic Memory

Bowen Jiang, Yuan Yuan et al.

· 2025

PersonaMem-v2 combines PERSONAMEM-V2: IMPLICIT PERSONAS, RL with Long-Context Reasoning, RL with Agentic Memory, and a User Privacy-Aware Design to train Qwen3-4B with GRPO on implicit user preferences from long, noisy histories. PersonaMem-v2 achieves 55.2% MCQ and 60.7% open-ended accuracy on PERSONAMEM-V2, surpassing GPT-5-Chat’s 45.6% and 46.2% while using a 2k-token agentic memory instead of full 32k–128k contexts.

arXiv:2512.06688 Read explainer

BenchmarkBenchmarkLong-Term Memory

Pre-Storage Reasoning for Episodic Memory: Shifting Inference Burden to Memory for Personalized Dialogue

Sangyeop Kim, Yohan Lee et al.

· 2025

PREMem builds long term dialogue memory by combining Episodic Memory Extraction, Pre Storage Memory Reasoning, semantic clustering, a persistent memory pool, and an inference phase over enriched memory fragments. PREMem reaches 71.4 LLM as a judge on LongMemEval with gpt 4.1 base, a +15.5 gain over HippoRAG 2 and +9.6 over A Mem.

arXiv:2509.10852 Read explainer

RAGBenchmarkBenchmarkLong-Term Memory

RGMem: Renormalization Group-inspired Memory Evolution for Language Agents

Ao Tian, Yunfeng Lu et al.

· 2025

RGMem builds a multi-scale memory state using Microscopic Evidence Space DL0, Structured Knowledge Space G, and renormalization operators RK1, RK2, RK3 to evolve user profiles. On PersonaMem with GPT-4.1, RGMem reaches 74.01% Avg., beating Memory OS by 8.98 points.

arXiv:2510.16392 Read explainer