HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model

AuthorsMengkang Hu, Tianxing Chen, Qiguang Chen et al.

2024

TL;DR

HiAgent uses subgoal based hierarchical working memory with observation summarization and trajectory retrieval to double success rate (42% vs 21%) on long horizon agent tasks.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Long horizon agents drown in redundant context and stall at 21.00% success rate

STANDARD agents push all historical action observation pairs into context, creating long redundant working memory that harms reasoning on long horizon tasks.

On AgentBoard tasks like Blocksworld and Tyreworld, this STANDARD strategy yields only 21.00% overall success rate and long trajectories that waste context and runtime.

HOW IT WORKS

HiAgent — subgoal based hierarchical working memory

HiAgent’s core mechanism combines Subgoal based Hierarchical Working Memory, Observation Summarization, and Trajectory Retrieval to chunk trajectories by subgoals and compress past details.

You can think of HiAgent like a programmer using RAM for the current function and a log file for completed functions, only reopening logs when debugging.

This hierarchical working memory lets HiAgent keep context focused on the current subgoal while still recalling detailed past trajectories on demand, something a flat context window cannot do.

DIAGRAM

HiAgent in trial interaction and memory update flow

This diagram shows how HiAgent alternates between generating subgoals, executing actions, summarizing observations, and retrieving trajectories within a single trial.

DIAGRAM

Evaluation pipeline and ablation design for HiAgent

This diagram shows how HiAgent is evaluated on AgentBoard tasks and how ablations remove Observation Summarization and Trajectory Retrieval.

PROCESS

How HiAgent Handles a Long Horizon Agent Task

  1. 01

    Subgoal based Hierarchical Working Memory

    HiAgent first uses Subgoal based Hierarchical Working Memory to prompt the LLM to formulate a subgoal gi as a milestone for the task.

  2. 02

    Generate precise actions

    Conditioned on the current subgoal, HiAgent generates precise actions and collects action observation pairs into a memory chunk for that subgoal.

  3. 03

    Observation Summarization

    When HiAgent determines the subgoal is fulfilled, Observation Summarization compresses the chunk into a summarized observation si and replaces detailed pairs with (gi, si).

  4. 04

    Trajectory Retrieval

    If HiAgent later needs details from a past subgoal, Trajectory Retrieval recalls the full action observation trajectory for that subgoal into the working memory.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Hierarchical working memory management

    HiAgent introduces Subgoal based Hierarchical Working Memory that chunks trajectories by subgoals and yields 42.00% overall success rate versus 21.00% for STANDARD.

  • 02

    Observation Summarization and Trajectory Retrieval

    HiAgent combines Observation Summarization and Trajectory Retrieval so past subgoals are stored as concise summaries but can be expanded into detailed trajectories when needed.

  • 03

    Comprehensive evaluation on long horizon tasks

    HiAgent is evaluated on five AgentBoard tasks with more than 20 steps, reducing average steps by 3.80 and context length by 35.02% compared to STANDARD.

RESULTS

By the Numbers

Success Rate (SR)

42.00%

+21.00 over STANDARD

Progress Rate (PR)

62.55%

+23.94 over STANDARD

Average Steps

22.61 steps

3.80 fewer steps than STANDARD

Context Efficiency

64.98% tokens

35.02% fewer context tokens than STANDARD

On five AgentBoard long horizon tasks (Blocksworld, Gripper, Tyreworld, Barman, Jericho), HiAgent doubles overall success rate and shortens trajectories versus STANDARD. These numbers show that hierarchical working memory with subgoal chunks materially improves both effectiveness and efficiency for LLM based agents.

BENCHMARK

By the Numbers

On five AgentBoard long horizon tasks (Blocksworld, Gripper, Tyreworld, Barman, Jericho), HiAgent doubles overall success rate and shortens trajectories versus STANDARD. These numbers show that hierarchical working memory with subgoal chunks materially improves both effectiveness and efficiency for LLM based agents.

BENCHMARK

Overall performance of STANDARD and HiAgent on 5 long horizon agent tasks

Overall Success Rate (SR) across Blocksworld, Gripper, Tyreworld, Barman, and Jericho.

BENCHMARK

Ablation study of HiAgent on Tyreworld

Success Rate (SR) on Tyreworld for HiAgent and ablations w o OS, w o TR, and w o OS and TR.

KEY INSIGHT

The Counterintuitive Finding

On Tyreworld, HiAgent with full hierarchical memory reaches 60.0% success rate, while task decomposition without hiding trajectories reaches only 40.0%.

This is surprising because many assume subgoal generation alone is enough, but HiAgent shows that without memory compression, context length and runtime actually increase.

WHY IT MATTERS

What this unlocks for the field

HiAgent unlocks long horizon agents that maintain high executability and progress even as steps exceed 20, by keeping working memory focused and structured.

Builders can now design LLM agents that tackle complex multi step environments like Jericho or Tyreworld without hitting context limits or drowning in redundant history.

~12 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

Questions about this paper?

Paper: HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model

Answers use this explainer on Memory Papers.

Checking…