MemInsight: Autonomous Memory Augmentation for LLM Agents

AuthorsRana Salama, Jason Cai, Michelle Yuan et al.

2025

TL;DR

MemInsight uses autonomous attribute based memory augmentation with priority scoring to reach 60.5% Recall@5 on LoCoMo, a 34.0-point gain over DPR.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

LLM agents drown in unstructured memory and noisy long term context

As interactions accumulate, raw historical data grows rapidly and becomes noisy and imprecise, hindering retrieval and degrading agent performance.

This breakdown in memory retrieval harms long term reasoning and personalization, especially for complex tasks like conversational recommendation, question answering, and event summarization.

HOW IT WORKS

MemInsight — Autonomous Memory Augmentation

MemInsight’s core mechanism combines Attribute Mining, Annotation and Attribute Prioritization, and Memory Retrieval to turn raw dialogues into structured, queryable memory instances.

You can think of MemInsight as giving the agent a card catalog for its memory, where attributes are index cards and priority decides which drawers to open first.

This structured augmentation lets MemInsight retrieve semantically targeted slices of history that a plain context window or vanilla dense RAG cannot isolate or rank effectively.

DIAGRAM

MemInsight Refined Retrieval Flow

This diagram shows how MemInsight uses refined retrieval to answer a question by augmenting the query and filtering augmented memory.

DIAGRAM

MemInsight Evaluation Pipeline Across Tasks

This diagram shows how MemInsight is evaluated on LLM-REDIAL and LoCoMo from augmentation through retrieval to task metrics.

PROCESS

How MemInsight Handles a Multi Session Dialogue

  1. 01

    Attribute Mining

    MemInsight applies Attribute Mining to each new interaction, extracting entity centric and conversation centric attributes guided by perspective and granularity.

  2. 02

    Attribute Granularity

    MemInsight chooses turn level or session level Attribute Granularity, generating complementary annotations like events, emotions, and intents over dialogue spans.

  3. 03

    Annotation and Attribute Prioritization

    MemInsight runs Annotation and Attribute Prioritization to attach attribute value pairs to memory instances and sort them into Basic or Priority augmentation.

  4. 04

    Memory Retrieval

    MemInsight uses Memory Retrieval with attribute based or embedding based retrieval to fetch top k augmented memories and integrate them into current context.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Structured Autonomous Memory Augmentation

    MemInsight introduces Attribute Mining and Annotation and Attribute Prioritization to adapt memory representations while preserving context across extended conversations and tasks.

  • 02

    Augmentation Aware Memory Retrieval

    MemInsight designs Memory Retrieval methods, including attribute based retrieval and embedding based retrieval, to filter irrelevant memory while retaining key historical insights.

  • 03

    Multi Task Empirical Validation

    MemInsight achieves 60.5% Recall@5 on LoCoMo versus 26.5% for DPR and improves persuasiveness by up to 14% on LLM REDIAL conversational recommendation.

RESULTS

By the Numbers

RECALL@k=5

60.5%

+34.0 over RAG Baseline (DPR)

F1 Overall

30.1%

+1.4 over RAG Baseline (DPR)

Genre Match R@1

0.400

+0.080 over Baseline (Claude-3-Sonnet)

Avg. Attributes

7.39

per movie in LLM REDIAL memory augmentation

On the LoCoMo question answering benchmark, which tests single hop, multi hop, temporal, open domain, and adversarial questions, MemInsight’s 60.5% Recall@5 versus 26.5% for DPR shows that attribute guided augmentation greatly improves retrieval. On LLM REDIAL conversational recommendation, MemInsight’s 0.400 genre match Recall@1 versus 0.320 for the Claude-3-Sonnet baseline demonstrates stronger alignment between recommended and ground truth genres.

BENCHMARK

By the Numbers

On the LoCoMo question answering benchmark, which tests single hop, multi hop, temporal, open domain, and adversarial questions, MemInsight’s 60.5% Recall@5 versus 26.5% for DPR shows that attribute guided augmentation greatly improves retrieval. On LLM REDIAL conversational recommendation, MemInsight’s 0.400 genre match Recall@1 versus 0.320 for the Claude-3-Sonnet baseline demonstrates stronger alignment between recommended and ground truth genres.

BENCHMARK

Results for the RECALL@k=5 accuracy for Embedding based retrieval for answer generation using LoCoMo dataset

RECALL@k=5 on LoCoMo question answering for embedding based retrieval methods.

KEY INSIGHT

The Counterintuitive Finding

MemInsight retrieves only 10 memory items with embedding based retrieval yet reaches 0.400 genre match Recall@1, beating a 144 item baseline at 0.320.

This is surprising because many assume more retrieved context always helps, but MemInsight shows that fewer, attribute filtered memories can be both leaner and more effective.

WHY IT MATTERS

What this unlocks for the field

MemInsight unlocks scalable, semantically structured long term memory where agents can autonomously mine, prioritize, and retrieve attributes across sessions.

Builders can now plug MemInsight into LLM agents to get attribute aware RAG that supports multi hop reasoning, persuasive recommendation, and event summarization without hand crafted schemas.

~12 min read← Back to papers

Related papers

RAG

A Dynamic Retrieval-Augmented Generation System with Selective Memory and Remembrance

Okan Bursa

· 2026

Adaptive RAG Memory (ARM) augments a standard retriever–generator stack with a Dynamic Embedding Layer and Remembrance Engine that track usage statistics and apply selective remembrance and decay to embeddings. On a lightweight retrieval benchmark, ARM achieves NDCG@5 ≈ 0.9401 and Recall@5 = 1.000 with 22M parameters, matching larger baselines like gte-small while providing the best efficiency among ultra-efficient models.

RAGLong-Term Memory

HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues

Yijie Zhong, Yunfan Gao, Haofen Wang

· 2026

HingeMem combines Boundary Guided Long-Term Memory, Dialogue Boundary Extraction, Memory Construction, Query Adaptive Retrieval, Hyperedge Rerank, and Adaptive Stop to segment dialogues into element-indexed hyperedges and plan query-specific retrieval. On LOCOMO, HingeMem achieves 63.9 overall F1 and 75.1 LLM-as-a-Judge score, surpassing the best baseline Zep (56.9 F1) by 7.0 F1 without using category-specific QA formats.

Questions about this paper?

Paper: MemInsight: Autonomous Memory Augmentation for LLM Agents

Answers use this explainer on Memory Papers.

Checking…