MemFactory: Unified Inference & Training Framework for Agent Memory

AuthorsZiliang Guo, Ziheng Li, Bo Tang et al.

2026

TL;DR

MemFactory uses a modular Module–Agent–Environment–Trainer stack with built in GRPO to improve MemAgent style recurrent memory by up to 14.8% average score.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Memory RL research is fragmented and hard to reproduce

MemFactory targets fragmented Memory RL implementations that are highly customized, task specific, and scattered across isolated repositories, making reproduction and extension difficult.

This fragmentation means Memory R1, MemAgent, and RMM style systems cannot easily swap modules, slowing progress on long term memory, retrieval, and policy optimization research.

HOW IT WORKS

MemFactory framework — modular layers plus GRPO

MemFactory’s core mechanism is a four layer stack: Module Layer, Agent Layer, Environment Layer, and Trainer Layer, with RecurrentMemoryModule for end to end memory.

You can think of MemFactory like Lego blocks plus an RL engine: modules are bricks, the Agent Layer is the assembly, and GRPO is the tuning knob.

This layered design lets MemFactory learn extraction, updating, retrieval, and recurrent memory policies that a plain context window or static RAG pipeline cannot express.

DIAGRAM

Agent rollout and environment interaction flow

This diagram shows how MemFactory agents roll out trajectories, interact with the Environment Layer, and receive GRPO rewards during training.

DIAGRAM

MemFactory training and evaluation pipeline

This diagram shows how MemFactory loads MemAgent data, trains with GRPO, and evaluates on eval_50, eval_100, and eval_fwe_16384.

PROCESS

How MemFactory Handles a Memory augmented training session

  1. 01

    Module Layer

    MemFactory configures Extractor, Updater, Retriever, and RecurrentMemoryModule implementations, exposing generate, rollout, and inference interfaces for memory operations.

  2. 02

    Agent Layer

    MemFactory assembles modules into an agent that executes policies, performs rollouts, and loads pre trained Qwen3 checkpoints with FlashAttention 2.

  3. 03

    Environment Layer

    MemFactory converts MemAgent datasets into standardized states, maintains MemoryBankEnv or LongcontextEnv, and computes Format and LLM as a Judge rewards.

  4. 04

    Trainer Layer

    MemFactory runs GRPO, sampling grouped trajectories, computing advantages without a critic, and updating the agent policy while logging metrics via SwanLab.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Unified Memory RL Infrastructure

    MemFactory standardizes MemoryBankEnv, LongcontextEnv, and RecurrentMemoryModule within a four layer stack, unifying training, evaluation, and inference for memory augmented agents.

  • 02

    Highly Modular and Extensible Design

    MemFactory exposes Extractor, Updater, Retriever, and Agent Module classes with generate, rollout, and inference interfaces, enabling Lego like assembly of Memory R1, MemAgent, and RMM style agents.

  • 03

    Empirical Validation on MemAgent

    MemFactory trains MemAgent style agents on Qwen3 1.7B and Qwen3 4B Instruct, achieving up to 14.8 percent relative average improvement over base checkpoints.

RESULTS

By the Numbers

eval_50

0.5684 score

+0.0957 over Qwen3-1.7B Base checkpoint

eval_100

0.4863 score

+0.0566 over Qwen3-1.7B Base checkpoint

eval_fwe_16384

0.6426 score

+0.0156 over Qwen3-4B-Instruct Base checkpoint

Average

0.3581 score

+0.0463 over Qwen3-1.7B Base checkpoint

On MemAgent eval_50, eval_100, and eval_fwe_16384, MemFactory improves both Qwen3 1.7B and Qwen3 4B Instruct, proving MemFactory effectively optimizes recurrent memory policies via GRPO.

BENCHMARK

By the Numbers

On MemAgent eval_50, eval_100, and eval_fwe_16384, MemFactory improves both Qwen3 1.7B and Qwen3 4B Instruct, proving MemFactory effectively optimizes recurrent memory policies via GRPO.

BENCHMARK

Performance of MemoryAgent trained via MemFactory on three test sets

Average score (avg@4) on MemAgent eval_50, eval_100, and eval_fwe_16384 for Qwen3 base checkpoints versus MemFactory RL.

KEY INSIGHT

The Counterintuitive Finding

MemFactory slightly reduces Qwen3 1.7B performance on the OOD eval_fwe_16384 set from 0.0332 to 0.0195 despite large main task gains.

This is surprising because RL tuned recurrent memory policies might be expected to generalize better, yet MemFactory shows that stronger in distribution optimization can hurt OOD robustness.

WHY IT MATTERS

What this unlocks for the field

MemFactory unlocks a reusable GRPO based stack where Memory R1, MemAgent, and RMM style modules can be mixed, matched, and trained without bespoke pipelines.

Builders can now prototype new memory extractors, updaters, retrievers, or recurrent modules and benchmark them end to end on MemAgent style datasets using a single GPU friendly framework.

~11 min read← Back to papers

Related papers

Agent MemoryLong-Term Memory

Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents

Yi Yu, Liuyi Yao et al.

arXiv 2026 · 2026

Agentic Memory (AgeMem) exposes memory management tools, a three-stage progressive RL strategy, and step-wise GRPO directly inside the agent policy to jointly control long-term and short-term memory. On Qwen3-4B-Instruct, AgeMem attains 54.31% average performance across ALFWorld, SciWorld, PDDL, BabyAI, and HotpotQA, exceeding the best baseline A-Mem at 45.74%.

Agent Memory

AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations

Cheng Jiayang, Dongyu Ru et al.

· 2026

AMemGym combines Structured Data Generation, On-Policy Interaction, Evaluation Metrics, and Meta-Evaluation to script user state trajectories, drive LLM-simulated role-play, and score write–read–utilization behavior. On AMemGym’s base configuration, AWE-(2,4,30) reaches a 0.291 normalized memory score on interactive evaluation, while native gpt-4.1-mini only achieves 0.203, exposing substantial gaps between memory agents and plain long-context LLMs.

Agent Memory

AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

Emmanuel Bamidele

· 2026

AMV-L manages agent memory using a Memory Value Model, Tiered Lifecycle, Bounded Retrieval Path, and Lifecycle Manager to decouple retention from retrieval eligibility. Under a 70k-request long-running workload, AMV-L improves throughput from 9.027 to 36.977 req/s over TTL and reduces p99 latency from 5398.167 ms to 1233.430 ms while matching LRU’s retrieval quality.

Questions about this paper?

Paper: MemFactory: Unified Inference & Training Framework for Agent Memory

Answers use this explainer on Memory Papers.

Checking…