Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents

AuthorsBin Wen, Ruoxuan Zhang, Yang Chen et al.

2026

TL;DR

Neuro-Symbolic Dual Memory Framework combines neural Progress Memory and symbolic Feasibility Memory to reach 94.78% success on ALFWorld, +5.97 points over AWM.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Long-horizon agents drift and violate feasibility in complex environments

Neuro-Symbolic Dual Memory Framework targets failures where agents fall into endless trial and error or deviate from goals due to Progress Drift and Feasibility Violations.

In ALFWorld, WebShop, and TextCraft, such drift and violations cause long trajectories, high invalid action rates, and frequent task failures for LLM-based agents.

HOW IT WORKS

Neuro-Symbolic Dual Memory Framework — Progress Memory and Feasibility Memory

Neuro-Symbolic Dual Memory Framework introduces Progress Memory, Feasibility Memory, a Blueprint Planner Agent, a Progress Monitor Agent, and an Actor Agent to decouple global progress and local feasibility.

You can think of Progress Memory as semantic RAM storing procedural blueprints, while Feasibility Memory acts like a rule-based hardware controller that blocks unsafe instructions.

This neuro-symbolic split lets Neuro-Symbolic Dual Memory Framework provide stage-aware guidance and executable Python verifiers that a plain context window cannot reliably maintain over long horizons.

DIAGRAM

Dual-alignment inference loop for long-horizon control

This diagram shows how Neuro-Symbolic Dual Memory Framework runs its dual-alignment inference loop with Progress Memory and Feasibility Memory at each timestep.

DIAGRAM

Ablation and evaluation design across benchmarks

This diagram shows how Neuro-Symbolic Dual Memory Framework is evaluated and ablated on ALFWorld, WebShop, and TextCraft.

PROCESS

How Neuro-Symbolic Dual Memory Framework Handles a Long-Horizon Task

01
Feasibility Memory Construction
Neuro-Symbolic Dual Memory Framework uses the Inductor Agent on failed transitions to build symbolic Python verifiers in Feasibility Memory from the global transition pool.
02
Progress Memory Construction
Neuro-Symbolic Dual Memory Framework applies the Distiller Agent to successful trajectories, creating procedural blueprints and anchor aligned action chunks in Progress Memory.
03
Dual-Alignment Inference via Neuro-Symbolic Memory
During inference, the Blueprint Planner Agent and Actor Agent retrieve stage specific chunks from Progress Memory while Feasibility Memory refines candidate actions.
04
Progress Monitor Agent
After each action, the Progress Monitor Agent updates anchor completion, advancing Neuro-Symbolic Dual Memory Framework through progress anchors when stages are satisfied.

KEY CONTRIBUTIONS

Key Contributions

01
Dual-alignment view of long-horizon failure
Neuro-Symbolic Dual Memory Framework formalizes global progress alignment and local feasibility alignment as distinct objectives handled by Progress Memory and Feasibility Memory, explaining why single paradigms fail.
02
Neuro-symbolic dual memory framework
Neuro-Symbolic Dual Memory Framework instantiates a neural Progress Memory with stage aware blueprints and a symbolic Feasibility Memory with executable verifiers in one unified inference loop.
03
Extensive experiments
Neuro-Symbolic Dual Memory Framework reaches 94.78% success on ALFWorld, 51% success and 0.7132 score on WebShop, and 94% success on TextCraft, with ablations isolating each memory's role.

RESULTS

By the Numbers

ALFWorld Success Rate (%)

94.78%

+5.97 over AWM

WebShop Success Rate (%)

+16 over Reflexion

WebShop Score

0.7132

+0.1134 over WALL-E 2.0

TextCraft Success Rate (%)

+6 over ExpeL

On ALFWorld, WebShop, and TextCraft, which test embodied control, web navigation, and compositional crafting, Neuro-Symbolic Dual Memory Framework consistently improves success and efficiency. These results show that decoupling progress and feasibility lets Neuro-Symbolic Dual Memory Framework reduce invalid actions and trajectory length while raising completion rates.

BENCHMARK

By the Numbers

BENCHMARK

Main Results on ALFWorld

Success Rate (%) on ALFWorld unseen split for Neuro-Symbolic Dual Memory Framework and baselines.

BENCHMARK

Feasibility Memory Ablation on ALFWorld Subset

Invalid Action Rate (IAR) on 50-task ALFWorld subset for different Feasibility setups.

KEY INSIGHT

The Counterintuitive Finding

On the 50 task ALFWorld subset, Prompt Rules reduce invalid action rate to 12.11% but drop success rate to 84, below the No Rules baseline of 88.

This is surprising because stronger constraints usually seem beneficial, yet Neuro-Symbolic Dual Memory Framework shows that overly strict prompt based feasibility harms long horizon progress even with fewer invalid actions.

WHY IT MATTERS

What this unlocks for the field

Neuro-Symbolic Dual Memory Framework enables LLM agents to follow stage aware procedural blueprints while enforcing executable Python verifiers, stabilizing behavior over long horizons.

Builders can now design agents that reuse distilled successes and failures separately, achieving shorter trajectories, fewer invalid actions, and higher success without expanding the context window.

~14 min read← Back to papers

Related papers

Cognitive ArchitectureAgent Memory

Aeon: High-Performance Neuro-Symbolic Memory Management for Long-Horizon LLM Agents

Mustafa Arslan

· 2026

Aeon restructures LLM memory using the Atlas, Trace, Semantic Lookaside Buffer, Write Ahead Log, and Sidecar Blob Arena inside a zero copy Core Shell kernel. Aeon achieves 4.70 ns INT8 dot products, 3.09 µs Atlas traversal at 100K nodes, 3.1× compression, and P99 read latency of 750 ns under 16 thread contention compared to FP32 and flat scan baselines.

arXiv:2601.15311 Read explainer

Cognitive ArchitectureAgent Memory

D-Mem: A Dual-Process Memory System for LLM Agents

Zhixing You, Jiachen Yuan, Jason Cai

· 2026

D-Mem combines Mem0∗, Quality Gating, and Full Deliberation into a dual-process memory system that incrementally stores vector memories and selectively scans raw history. On LoCoMo with GPT-4o-mini, D-Mem’s Quality Gating reaches 53.5 F1 versus the Mem0∗ baseline’s 51.2 F1, recovering 96.7% of the 55.3 F1 Full Deliberation performance with far fewer tokens.

arXiv:2603.18631 Read explainer

Cognitive ArchitectureLong-Term Memory

Human-Like Lifelong Memory: A Neuroscience-Grounded Architecture for Infinite Interaction

Diego C. Lerma-Torres

· 2026

Human-Like Lifelong Memory combines Executive Function and Working Memory, a Memory Service Knowledge Graph, and a Thalamic Gateway to implement dual-process, valence-aware lifelong memory. Human-Like Lifelong Memory is a theoretical framework with seven functional properties and testable predictions rather than benchmark numbers against specific baselines.

arXiv:2603.29023 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…