MR.Rec: Synergizing Memory and Reasoning for Personalized Recommendation Assistant with LLMs

AuthorsJiani Huang, Xingchen Zou, Lianghao Xia, Qing Li

2025

TL;DR

MR.Rec combines hierarchical RAG memory with reinforcement-learned, reasoning-enhanced retrieval to reach NDCG@100 = 0.113 vs 0.104 for the best baseline (+8.65%).

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

LLM recommenders are limited by context windows and single turn reasoning

MR.Rec targets LLM recommenders that are “constrained by limited context windows and single-turn reasoning,” which blocks deep personalization and dynamic preference tracking.

These limits prevent interactive assistants from using rich histories and implicit preferences, so recommendation quality drops and explanations stay shallow and query-only.

HOW IT WORKS

MR.Rec — Hierarchical memory plus reasoning enhanced retrieval

MR.Rec’s core mechanism links User-specific Local Memory, Cross-user Global Memory, Reasoning-enhanced Memory Retrieval, and Retrieval-Augmented Item Generation under Reinforcement Learning for Memory-synergized Reasoning.

You can think of MR.Rec like a CPU using a smart cache: global memory is shared knowledge, local memory is user cache, and reasoning is the controller deciding what to load.

This design lets MR.Rec iteratively fetch only useful memories, reason over them, and refine recommendations in ways a fixed LLM context window cannot match.

DIAGRAM

Multi turn reasoning enhanced memory retrieval in MR.Rec

This diagram shows how MR.Rec interleaves scenario analysis, aspect reasoning, and staged memory retrieval during a multi turn recommendation session.

DIAGRAM

Training loop for memory synergized reasoning with GRPO

This diagram shows how MR.Rec uses Group Relative Policy Optimization and multi component rewards to tune memory usage and reasoning quality.

PROCESS

How MR.Rec Handles a Recommendation Query Session

01
Memory Indexing for RAG Recommender
MR.Rec first builds User-specific Local Memory from behavior records, preference patterns, and user profiles, and constructs Cross-user Global Memory for each scenario.
02
Reasoning enhanced Memory Retrieval
Given a query, MR.Rec performs scenario analysis using global memory, infers preference aspects A_q, and selectively retrieves local memory segments \^M_u(q).
03
Retrieval Augmented Item Generation
MR.Rec feeds the query, inferred aspects, and retrieved memory into Retrieval-Augmented Item Generation to produce an ideal item profile I_u(q).
04
Reinforcement Learning for Memory synergized Reasoning
MR.Rec uses multi turn GRPO with format, recommendation, and memory utilization rewards to refine the LLM policy for retrieval and reasoning.

KEY CONTRIBUTIONS

Key Contributions

01
Comprehensive RAG system with hierarchical memory indexing
MR.Rec introduces User-specific Local Memory and Cross-user Global Memory to compress 73,078 local and 1,970 global entries into efficient, query aware external memory.
02
Reinforcement learning for memory synergized reasoning
MR.Rec applies Reinforcement Learning for Memory-synergized Reasoning with GRPO, combining format, nDCG based recommendation, and memory utilization rewards into a single objective.
03
Reasoning enhanced memory retrieval for personalization
MR.Rec’s Reasoning-enhanced Memory Retrieval interleaves aspect inference and selective memory access, enabling multi step, evidence grounded reasoning beyond static chain of thought prompts.

RESULTS

By the Numbers

N@100

0.113

+0.009 over Rec-R1

R@100

0.270

+0.010 over Rec-R1

R@10

0.122

+0.011 over Qwen2.5-3B w/ Naive Memory

N@10

0.084

+0.009 over Rec-R1 w/ Naive Memory

On the Amazon-C4 based recommendation benchmark across 28 categories, MR.Rec is compared against GPT-4o, DeepSeek-R1, Qwen-2.5-3B, BLAIR, and Rec-R1. The gains in NDCG@100 and Recall@100 show that MR.Rec’s memory synergized reasoning improves both ranking quality and hit rate over the strongest tuned baseline Rec-R1.

BENCHMARK

By the Numbers

BENCHMARK

Overall performance of baselines and MR.Rec (All categories, N@100)

NDCG@100 on the Amazon-C4 based test set averaged over 28 categories.

KEY INSIGHT

The Counterintuitive Finding

Static user summaries sometimes hurt performance: GPT-4o with static memory gets NDCG@100 = 0.095, below its naive memory variant at 0.104.

This is surprising because summarization is expected to denoise history, but MR.Rec shows that poorly aligned static profiles can remove useful signals and mislead LLM reasoning.

WHY IT MATTERS

What this unlocks for the field

MR.Rec unlocks recommendation assistants that can reason about which preference dimensions matter, then actively fetch matching memories across users and time.

Builders can now create small LLM based recommenders that learn when and how to call memory, rather than stuffing long histories into prompts or relying on brittle static profiles.

~14 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

arXiv:2601.07190 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…