Crafting Personalized Agents through Retrieval-Augmented Generation on Editable Memory Graphs

AuthorsZheng Wang, Zhongyang Li, Zeren Jiang et al.

2024

TL;DR

EMG-RAG uses an RL-driven Editable Memory Graph over smartphone memories to adaptively select evidence, yielding +11.83 BLEU over M-RAG on question answering.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Personal agents need editable, selectable memories from 0.35B items

EMG-RAG targets smartphone assistants where 0.35 billion memories are distilled from 11.35 billion raw texts, yet existing RAG uses fixed Top-K or giant haystacks.

In these assistants, personal tasks like reminders, autofill, and QA break when memories cannot be edited or combined adaptively, degrading user experience despite powerful LLMs.

HOW IT WORKS

EMG-RAG — Editable Memory Graphs plus RL-based RAG

EMG-RAG centers on Editable Memory Graphs, a three-layer structure with Memory Type Layer, Memory Subclass Layer, and Memory Graph Layer, trained via a MDP for Selecting Memories on EMGs using Data Collection-derived labels.

Conceptually, EMG-RAG turns the user’s phone into a graph-organized filing cabinet, where RL acts like a librarian walking the graph to pull just the right folders.

This design lets EMG-RAG support insertion, deletion, and replacement on memories while adaptively traversing the graph, something a flat context window or fixed Top-K retriever cannot achieve.

DIAGRAM

Memory selection as an MDP over the Editable Memory Graph

This diagram shows how EMG-RAG activates nodes, traverses the Editable Memory Graph with an RL agent, and accumulates rewards from LLM answer quality.

DIAGRAM

Data collection and evaluation pipeline for EMG-RAG

This diagram shows how EMG-RAG builds memories and QA labels from raw logs, then evaluates across QA, autofill forms, and user services.

PROCESS

How EMG-RAG Handles a Smartphone Assistant Question

01
Data Collection
EMG-RAG gathers conversations and screenshots, uses OCR and GPT-4 to extract memories M and generate QA pairs with required memories as supervision.
02
Editable Memory Graphs
EMG-RAG organizes memories into the Memory Type Layer, Memory Subclass Layer, and Memory Graph Layer, enabling insertion, deletion, and replacement operations.
03
MDP for Selecting Memories on EMGs
EMG-RAG activates Top-K nodes, builds states from cosine similarities, and trains an RL agent with warm-start and policy gradient to choose memories.
04
Applications of the Personalized Agents
EMG-RAG feeds selected memories plus the question into a frozen LLM to drive question answering, autofill forms, and user services like reminders and travel navigation.

KEY CONTRIBUTIONS

Key Contributions

01
Novel task of crafting LLM driven personalized agents
EMG-RAG defines smartphone-based personalized agents that must satisfy Editability and Selectability, using real assistant logs and GPT-4 generated QA pairs over 0.35 billion memories.
02
EMG-RAG combining EMG and RAG with RL
EMG-RAG introduces Editable Memory Graphs plus an MDP for Selecting Memories on EMGs, enabling end-to-end optimization of retrieval with reinforcement learning on a frozen LLM.
03
Extensive experiments on real world business dataset
EMG-RAG achieves approximately 10.6%, 9.5%, and 9.7% gains over M-RAG for question answering, autofill forms, and user services under continuous edits across four weeks.

RESULTS

By the Numbers

BLEU

75.99

+11.83 over M-RAG

ROUGE L

88.06

+3.32 over M-RAG

Autofill Forms EM

92.86%

+2.0 over M-RAG

User Services Travel EM

96.43%

+2.68 over M-RAG

On a real AI assistant dataset with 2,000 training users and 500 test users, EMG-RAG with GPT-4 is evaluated on question answering, autofill forms, and user services. The gains over M-RAG show that Editable Memory Graphs plus RL-based selection materially improve downstream personalization quality.

BENCHMARK

By the Numbers

BENCHMARK

Effectiveness of EMG-RAG in question answering with GPT-4

BLEU on question answering for different RAG methods combined with GPT-4.

BENCHMARK

Ablation study on EMG-RAG components (BLEU)

BLEU on question answering for EMG-RAG and ablations removing activated nodes, warm-start, or policy gradient.

KEY INSIGHT

The Counterintuitive Finding

EMG-RAG with K=3 activated nodes reaches 88.06 ROUGE-L and 2.14s inference, while K=5 does not improve ROUGE-L and costs 3.32s.

This is surprising because more activated nodes should help retrieval, but EMG-RAG shows that extra memories mainly add noise and latency instead of better answers.

WHY IT MATTERS

What this unlocks for the field

EMG-RAG shows that personal agents can maintain an editable, graph-structured memory that is traversed by an RL policy tuned directly on answer quality.

Builders can now design assistants that survive continuous user edits, combine multiple memories across types, and still stay within realistic latency and context limits.

~14 min read← Back to papers

Related papers

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

RAG

A Dynamic Retrieval-Augmented Generation System with Selective Memory and Remembrance

Okan Bursa

· 2026

Adaptive RAG Memory (ARM) augments a standard retriever–generator stack with a Dynamic Embedding Layer and Remembrance Engine that track usage statistics and apply selective remembrance and decay to embeddings. On a lightweight retrieval benchmark, ARM achieves NDCG@5 ≈ 0.9401 and Recall@5 = 1.000 with 22M parameters, matching larger baselines like gte-small while providing the best efficiency among ultra-efficient models.

arXiv:2601.02428 Read explainer

RAGLong-Term Memory

HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues

Yijie Zhong, Yunfan Gao, Haofen Wang

· 2026

HingeMem combines Boundary Guided Long-Term Memory, Dialogue Boundary Extraction, Memory Construction, Query Adaptive Retrieval, Hyperedge Rerank, and Adaptive Stop to segment dialogues into element-indexed hyperedges and plan query-specific retrieval. On LOCOMO, HingeMem achieves 63.9 overall F1 and 75.1 LLM-as-a-Judge score, surpassing the best baseline Zep (56.9 F1) by 7.0 F1 without using category-specific QA formats.

arXiv:2604.06845 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…