D-Mem: A Dual-Process Memory System for LLM Agents

AuthorsZhixing You, Jiachen Yuan, Jason Cai

2026

TL;DR

D-Mem uses Multi-dimensional Quality Gating to route between Mem0∗ and Full Deliberation, reaching 53.5 F1 on LoCoMo while recovering 96.7% of the 55.3 F1 upper bound.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Long-horizon agents lose fine-grained context due to lossy abstraction

D-Mem targets retrieval frameworks that “strip away potentially crucial contextual nuances,” leaving static retrieval unable to reconstruct logical chains lost during compression.

When LoCoMo dialogues average 24K tokens, simply feeding full history is expensive and worsens the Lost in the Middle effect, degrading deep reasoning and temporal logic.

HOW IT WORKS

D-Mem — Dual-process memory with gated deliberation

D-Mem centers on Mem0∗, Quality Gating, and Full Deliberation, combining incremental vector memory with a query-guided raw-history fallback.

You can think of Mem0∗ as fast RAM for routine recall, while Full Deliberation acts like a slower disk scan that rereads the entire log when needed.

This design lets D-Mem recover nuanced temporal and multi-hop dependencies that a plain context window or static top K retrieval cannot preserve.

DIAGRAM

Query-time dual-process flow in D-Mem

This diagram shows how D-Mem routes a query through Mem0∗, Quality Gating, and Full Deliberation at inference time.

DIAGRAM

Evaluation setup across LoCoMo and RealTalk

This diagram shows how D-Mem is evaluated on LoCoMo and RealTalk with GPT-4o-mini and Qwen3-235B-Instruct, including baselines and metrics.

PROCESS

How D-Mem Handles a LoCoMo Question

01
Mem0∗: The System 1 Retrieval Foundation
D-Mem uses Mem0∗ to retrieve the top 30 most similar memories C from the vector database and generate an initial answer Ainit for the query.
02
Gated Deliberation Policies
D-Mem applies Quality Gating to evaluate Ainit against the query and context along Relevance, Faithfulness and Consistency, and Completeness dimensions.
03
Full Deliberation
If Quality Gating fails, D-Mem triggers Full Deliberation to chunk the full conversation, extract scored facts, and filter them into an enhanced context C'.
04
Answer Generation
Using either C or C', D-Mem calls the backbone LLM to produce the final answer, which is evaluated with F1, BLEU, and LLM-as-a-Judge metrics.

KEY CONTRIBUTIONS

Key Contributions

01
The dual-process D-Mem Framework
D-Mem integrates Mem0∗, Quality Gating, and Full Deliberation to bridge efficient vector retrieval with exhaustive deliberate reading for long-horizon reasoning.
02
Full Deliberation as a robust baseline
D-Mem’s Full Deliberation processes raw dialogue chunk by chunk, reaching 55.3 F1 and 78.4 LLM score on LoCoMo with GPT-4o-mini.
03
High Performance with Computational Efficiency
D-Mem’s Quality Gating attains 53.5 F1 on LoCoMo with GPT-4o-mini, recovering 96.7% of Full Deliberation’s 55.3 F1 while using only 35.8% of its tokens.

RESULTS

By the Numbers

53.5

+2.3 over Mem0∗ on LoCoMo GPT-4o-mini

LLM

76.3

+3.6 over Mem0∗ on LoCoMo GPT-4o-mini

BLEU

43.1

+2.1 over Mem0∗ on LoCoMo GPT-4o-mini

Tokens

12681

35.8% of Full Deliberation token cost 35435 on LoCoMo GPT-4o-mini

On the LoCoMo benchmark, which contains 10 dialogues averaging 24K tokens and 1,540 questions, D-Mem’s Quality Gating nearly matches Full Deliberation while greatly reducing tokens. This shows D-Mem can maintain long-term reasoning fidelity without paying the full 35,435-token cost per query.

BENCHMARK

By the Numbers

BENCHMARK

Overall F1 on LoCoMo with GPT-4o-mini

F1 on LoCoMo for D-Mem Quality Gating versus Mem0∗, Nemori, and Full Context.

KEY INSIGHT

The Counterintuitive Finding

D-Mem’s Quality Gating recovers 96.7% of Full Deliberation’s 55.3 F1 on LoCoMo while using only 12,681 tokens versus 35,435 tokens.

This is surprising because exhaustive Full Deliberation increases tokens and inference time by over 10×, yet D-Mem achieves nearly the same accuracy without paying that cost on every query.

WHY IT MATTERS

What this unlocks for the field

D-Mem makes it practical to combine lightweight vector memories with selective, high-fidelity raw-history reading for long-horizon agents.

Builders can now deploy agents that keep months of dialogue, answer temporal and multi-hop questions robustly, and still stay within tight latency and token budgets.

~12 min read← Back to papers

Related papers

Cognitive ArchitectureAgent Memory

Aeon: High-Performance Neuro-Symbolic Memory Management for Long-Horizon LLM Agents

Mustafa Arslan

· 2026

Aeon restructures LLM memory using the Atlas, Trace, Semantic Lookaside Buffer, Write Ahead Log, and Sidecar Blob Arena inside a zero copy Core Shell kernel. Aeon achieves 4.70 ns INT8 dot products, 3.09 µs Atlas traversal at 100K nodes, 3.1× compression, and P99 read latency of 750 ns under 16 thread contention compared to FP32 and flat scan baselines.

arXiv:2601.15311 Read explainer

Cognitive ArchitectureAgent Memory

Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents

Bin Wen, Ruoxuan Zhang et al.

· 2026

Neuro-Symbolic Dual Memory Framework uses Progress Memory, Feasibility Memory, a Blueprint Planner Agent, a Progress Monitor Agent, and an Actor Agent to decouple semantic progress guidance from executable feasibility checks. On ALFWorld, Neuro-Symbolic Dual Memory Framework achieves 94.78% success rate versus 88.81% for AWM, and on WebShop reaches 0.7132 score versus 0.5998 for WALL-E 2.0.

arXiv:2604.02734 Read explainer

Cognitive ArchitectureLong-Term Memory

Human-Like Lifelong Memory: A Neuroscience-Grounded Architecture for Infinite Interaction

Diego C. Lerma-Torres

· 2026

Human-Like Lifelong Memory combines Executive Function and Working Memory, a Memory Service Knowledge Graph, and a Thalamic Gateway to implement dual-process, valence-aware lifelong memory. Human-Like Lifelong Memory is a theoretical framework with seven functional properties and testable predictions rather than benchmark numbers against specific baselines.

arXiv:2603.29023 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…