Active Context Compression: Autonomous Memory Management in LLM Agents

AuthorsNikhil Verma

2026

TL;DR

Focus Agent uses autonomous start_focus and complete_focus context pruning to maintain a persistent Knowledge block, cutting SWE-bench Lite tokens by 22.7% (14.9M → 11.5M) with unchanged 60% accuracy.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Context Bloat Makes Agents Sluggish and Distracted (22.7% extra tokens in Baseline)

Focus Agent targets Context Bloat, where interaction history grows so large that costs explode and reasoning degrades due to irrelevant past errors.

On context-intensive SWE-bench Lite tasks, a standard Append Only ReAct agent reprocesses every log line, causing quadratic cost growth, higher latency, and context poisoning that derails long-horizon software engineering.

HOW IT WORKS

Focus Loop with start_focus and complete_focus

Focus Agent introduces start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String Replace Editor scaffold to manage context autonomously.

The design mirrors slime mold exploration, where Focus Agent explores like RAM, then consolidates into a Knowledge block like disk, retracting from dead ends while leaving useful traces.

This Focus Loop lets Focus Agent prune raw logs, maintain a sawtooth context pattern, and preserve distilled learnings that a plain context window would waste on verbose tool outputs.

DIAGRAM

Focus Agent Sawtooth Context Cycle

This diagram shows how Focus Agent alternates exploration and compression phases using start_focus and complete_focus to create a sawtooth context pattern.

DIAGRAM

SWE bench Lite Evaluation Pipeline for Focus Agent

This diagram shows how Focus Agent is evaluated on SWE bench Lite with an optimized two tool scaffold and Docker based test verification.

PROCESS

How Focus Agent Handles a SWE bench Lite Task

01
Start Focus
Focus Agent calls start_focus to declare the subtask, mark a checkpoint, and link upcoming exploration to the Knowledge block.
02
Explore
Focus Agent uses Persistent Bash and String Replace Editor repeatedly, following the directive to use tools as much as possible, ideally more than 100 times.
03
Consolidate
Focus Agent invokes complete_focus to summarize what was attempted, what was learned, and the outcome into a structured Knowledge block entry.
04
Withdraw
Focus Agent prunes all messages between the start_focus checkpoint and the current step, shrinking context while keeping the Knowledge block at the top.

KEY CONTRIBUTIONS

Key Contributions

01
Intra Trajectory Compression
Focus Agent introduces intra trajectory compression with start_focus and complete_focus, letting agents prune history mid task instead of only between episodes.
02
Aggressive Compression Prompting
Focus Agent uses aggressive prompting, including mandatory workflows and reminders to compress every 10 15 tool calls, raising average compressions to 6.0 per task.
03
Optimized SWE bench Scaffold
Focus Agent is evaluated with a Persistent Bash and String Replace Editor scaffold, matching Anthropic best practices while achieving 22.7% token savings at 60% success.

RESULTS

By the Numbers

Task Success (Tests Pass)

3/5 (60%)

0.0 percentage points vs Baseline

Total Tokens

11,526,418 tokens

-3,394,137 tokens vs Baseline 14,920,555

Avg Tokens/Task

2,305,284 tokens

-678,827 tokens vs Baseline 2,984,111

Avg Compressions

6.0 per task

6.0 more than Baseline 0

These results come from five context intensive SWE bench Lite instances using claude haiku 4 5 with an identical scaffold. The 22.7% token reduction while matching 3/5 = 60% success shows that Focus Agent can self regulate context without sacrificing task accuracy.

BENCHMARK

By the Numbers

BENCHMARK

A/B Comparison on SWE bench Lite (Haiku 4.5, N=5 Hard Instances)

Total Tokens consumed by Baseline vs Focus Agent on five hard SWE bench Lite instances.

KEY INSIGHT

The Counterintuitive Finding

Focus Agent cuts total tokens by 22.7% (14.9M → 11.5M) while keeping task success identical at 3/5 = 60% on SWE bench Lite.

This is surprising because earlier passive Focus prompting reduced tokens by only 6% but hurt accuracy, contradicting the assumption that compression always trades off performance.

WHY IT MATTERS

What this unlocks for the field

Focus Agent shows that capable LLM agents can autonomously self regulate context using intra trajectory compression and a persistent Knowledge block.

Builders can now design cost aware, long horizon coding agents that stay within context limits while retaining distilled learnings, without external memory infrastructure or separate compression models.

~10 min read← Back to papers

Related papers

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

BenchmarkBenchmarkLong-Term Memory

AgenticAI-DialogGen: Topic-Guided Conversation Generation for Fine-Tuning and Evaluating Short- and Long-Term Memories of LLMs

Manoj Madushanka Perera, Adnan Mahmood et al.

· 2026

AgenticAI-DialogGen chains ChatPreprocessor, KnowledgeExtractor, TopicAnalyzer, KnowledgeGraphBuilder, PersonaGenerator, DuelingChat Agent, ConversationValidator, ConversationRefiner, QAGeneration, and PostProcessing to turn raw multi-session chats into topic-guided, persona-grounded conversations with explicit short- and long-term memories. On the TGC / KG memory QA benchmark, Mistral-7B fine-tuned within AgenticAI-DialogGen achieves 87.36 F1, compared to GPT-4’s 83.77 F1 in a zero-shot setting on the same task.

arXiv:2604.12179 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…