AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management

AuthorsRuoyao Wen, Hao Li, Chaowei Xiao, Ning Zhang

2026

TL;DR

AGENTSYS uses hierarchical context isolation with schema-validated worker agents to cut indirect prompt injection to 0.78% ASR on AgentDojo while slightly improving utility.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Attack persistence from bloated memory hits 60.53 percent ASR

Conventional agents append every tool output into context, so early injections persist and can reach 60.53% attack success when injected in round one of four.

This persistent contamination makes multi step agents in benchmarks like AgentDojo both insecure and less accurate, dropping benign utility from 44.46% on short tasks to 19.08% on long tasks.

HOW IT WORKS

AGENTSYS: Hierarchical memory with isolated worker agents

AGENTSYS combines a Main Agent, Worker Agents, Intent Schemas, and an Alignment Validator to isolate untrusted tool outputs and only pass schema validated JSON upward.

You can think of AGENTSYS like an operating system: the main agent is the kernel, and worker agents are sandboxed processes with their own private memory.

This design lets AGENTSYS keep the main context short and clean, eliminating attack persistence and utility degradation that a plain context window cannot avoid.

DIAGRAM

AGENTSYS query time flow with isolated worker contexts

This diagram shows how AGENTSYS routes a single user task through the main agent, worker agents, validator, and back as a structured observation.

DIAGRAM

AGENTSYS evaluation and ablation pipeline

This diagram shows how AGENTSYS is evaluated on AgentDojo and ASB, including ablations and overhead analysis.

PROCESS

How AGENTSYS Handles a Multi Step Agent Task

  1. 01

    Context Bounded Delegation in Main Agent

    AGENTSYS lets the Main Agent choose a tool and declare an Intent Schema before seeing outputs, keeping only this schema and compact trace in memory.

  2. 02

    Isolated Context Extraction in Worker Agents

    AGENTSYS spawns a Worker Agent with only the raw tool output, the Intent Schema, and Stack, preventing user query and long history from leaking in.

  3. 03

    Validator Mediated Recursion Control

    AGENTSYS uses the Alignment Validator on command tools from Worker Agents, consulting the user query and Stack but never raw outputs.

  4. 04

    Bounded Recovery Mechanism

    AGENTSYS sanitizes suspicious outputs, restarts extraction with a fixed retry budget, and finally returns either structured JSON or a deterministic error object.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Hierarchical memory management for LLM agents

    AGENTSYS introduces Main Agent and Worker Agents with Intent Schemas, cutting ASR to 2.19% using context isolation alone in the w o Validator and Sanitizer variant.

  • 02

    Validator mediated secure recursion

    AGENTSYS adds an Alignment Validator and event triggered checks on command tools, achieving 0.78% ASR on AgentDojo while keeping 64.36% benign utility.

  • 03

    Efficient defense with bounded overhead

    AGENTSYS combines worker isolation, validator, and sanitizer to reach defense quality 63.86 on AgentDojo with 3.25M tokens, outperforming CaMeL’s 29.97 quality at 6.09M tokens.

RESULTS

By the Numbers

Benign Util.

64.36%

+0.82 over No Defense

Attacked Util.

52.87%

+4.60 over No Defense

ASR

0.78%

-29.88 vs No Defense

Defense Quality

63.86

+19.80 over No Defense

On AgentDojo, which tests Banking, Slack, Travel, and Workspace scenarios under indirect prompt injection, AGENTSYS reduces ASR from 30.66% to 0.78% while slightly improving benign utility. This shows AGENTSYS can harden agents without sacrificing performance, even on long horizon tool use tasks.

BENCHMARK

By the Numbers

On AgentDojo, which tests Banking, Slack, Travel, and Workspace scenarios under indirect prompt injection, AGENTSYS reduces ASR from 30.66% to 0.78% while slightly improving benign utility. This shows AGENTSYS can harden agents without sacrificing performance, even on long horizon tool use tasks.

BENCHMARK

Main experimental results on AgentDojo using GPT 4o mini

Benign Utility (%) comparison on AgentDojo under different defenses.

BENCHMARK

AGENTSYS ablation on AgentDojo under indirect prompt injection

Attack Success Rate (%) for AGENTSYS and its ablations on AgentDojo.

KEY INSIGHT

The Counterintuitive Finding

AGENTSYS slightly increases benign utility on AgentDojo from 63.54% to 64.36%, even though it aggressively discards verbose tool outputs and reasoning traces.

This is surprising because many defenses trade accuracy for safety, but AGENTSYS shows that strict memory isolation can improve reasoning instead of hurting it.

WHY IT MATTERS

What this unlocks for the field

AGENTSYS unlocks secure, dynamic multi step agents whose main memory never directly sees raw external content or subtask reasoning traces.

Builders can now design long horizon, tool heavy workflows where only schema validated JSON crosses boundaries, making indirect prompt injection both rarer and easier to audit.

~14 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

Questions about this paper?

Paper: AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management

Answers use this explainer on Memory Papers.

Checking…