AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management

AuthorsRuoyao Wen, Hao Li, Chaowei Xiao, Ning Zhang

2026

TL;DR

AGENTSYS uses hierarchical context isolation with schema-validated worker agents to cut indirect prompt injection to 0.78% ASR on AgentDojo while slightly improving utility.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Attack persistence from bloated memory hits 60.53 percent ASR

Conventional agents append every tool output into context, so early injections persist and can reach 60.53% attack success when injected in round one of four.

This persistent contamination makes multi step agents in benchmarks like AgentDojo both insecure and less accurate, dropping benign utility from 44.46% on short tasks to 19.08% on long tasks.

HOW IT WORKS

AGENTSYS: Hierarchical memory with isolated worker agents

AGENTSYS combines a Main Agent, Worker Agents, Intent Schemas, and an Alignment Validator to isolate untrusted tool outputs and only pass schema validated JSON upward.

You can think of AGENTSYS like an operating system: the main agent is the kernel, and worker agents are sandboxed processes with their own private memory.

This design lets AGENTSYS keep the main context short and clean, eliminating attack persistence and utility degradation that a plain context window cannot avoid.

DIAGRAM

AGENTSYS query time flow with isolated worker contexts

This diagram shows how AGENTSYS routes a single user task through the main agent, worker agents, validator, and back as a structured observation.

DIAGRAM

AGENTSYS evaluation and ablation pipeline

This diagram shows how AGENTSYS is evaluated on AgentDojo and ASB, including ablations and overhead analysis.

PROCESS

How AGENTSYS Handles a Multi Step Agent Task

01
Context Bounded Delegation in Main Agent
AGENTSYS lets the Main Agent choose a tool and declare an Intent Schema before seeing outputs, keeping only this schema and compact trace in memory.
02
Isolated Context Extraction in Worker Agents
AGENTSYS spawns a Worker Agent with only the raw tool output, the Intent Schema, and Stack, preventing user query and long history from leaking in.
03
Validator Mediated Recursion Control
AGENTSYS uses the Alignment Validator on command tools from Worker Agents, consulting the user query and Stack but never raw outputs.
04
Bounded Recovery Mechanism
AGENTSYS sanitizes suspicious outputs, restarts extraction with a fixed retry budget, and finally returns either structured JSON or a deterministic error object.

KEY CONTRIBUTIONS

Key Contributions

01
Hierarchical memory management for LLM agents
AGENTSYS introduces Main Agent and Worker Agents with Intent Schemas, cutting ASR to 2.19% using context isolation alone in the w o Validator and Sanitizer variant.
02
Validator mediated secure recursion
AGENTSYS adds an Alignment Validator and event triggered checks on command tools, achieving 0.78% ASR on AgentDojo while keeping 64.36% benign utility.
03
Efficient defense with bounded overhead
AGENTSYS combines worker isolation, validator, and sanitizer to reach defense quality 63.86 on AgentDojo with 3.25M tokens, outperforming CaMeL’s 29.97 quality at 6.09M tokens.

RESULTS

By the Numbers

Benign Util.

64.36%

+0.82 over No Defense

Attacked Util.

52.87%

+4.60 over No Defense

ASR

0.78%

-29.88 vs No Defense

Defense Quality

63.86

+19.80 over No Defense

On AgentDojo, which tests Banking, Slack, Travel, and Workspace scenarios under indirect prompt injection, AGENTSYS reduces ASR from 30.66% to 0.78% while slightly improving benign utility. This shows AGENTSYS can harden agents without sacrificing performance, even on long horizon tool use tasks.

BENCHMARK

By the Numbers

BENCHMARK

Main experimental results on AgentDojo using GPT 4o mini

Benign Utility (%) comparison on AgentDojo under different defenses.

BENCHMARK

AGENTSYS ablation on AgentDojo under indirect prompt injection

Attack Success Rate (%) for AGENTSYS and its ablations on AgentDojo.

KEY INSIGHT

The Counterintuitive Finding

AGENTSYS slightly increases benign utility on AgentDojo from 63.54% to 64.36%, even though it aggressively discards verbose tool outputs and reasoning traces.

This is surprising because many defenses trade accuracy for safety, but AGENTSYS shows that strict memory isolation can improve reasoning instead of hurting it.

WHY IT MATTERS

What this unlocks for the field

AGENTSYS unlocks secure, dynamic multi step agents whose main memory never directly sees raw external content or subtask reasoning traces.

Builders can now design long horizon, tool heavy workflows where only schema validated JSON crosses boundaries, making indirect prompt injection both rarer and easier to audit.

~14 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

arXiv:2601.07190 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…