BenchmarkAgent Memory
Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.
Agent Memory
Xiaohui Zhang, Zequn Sun et al.
· 2026
ActMem transforms dialogue history into atomic facts via Memory Fact Extraction, groups them with Fact Clustering, links them through a Memory KG Construction module, and uses Counterfactual-based Retrieval and Reasoning for action-aware answers. On ActMemEval, ActMem reaches 76.52% QA accuracy with DeepSeek-V3, beating LightMem’s 63.97% by 12.55 points and NaiveRAG’s 61.54%.
RAGBenchmarkAgent MemoryMemory Architecture
Xingyu Lyu, Jianfeng He et al.
· 2026
ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.
Agent MemoryLong-Term Memory
Guilin Zhang, Wei Jiang et al.
· 2026
A-MAC scores candidate memories using Utility, Confidence, Novelty, Recency, and Type Prior combined by a learned linear admission policy with Algorithm 1 A-MAC Memory Admission. On the LoCoMo benchmark, A-MAC achieves F1 0.583 and 2644 ms latency, improving F1 by 0.042 and reducing latency by 1187 ms compared to A-mem.
Cognitive ArchitectureAgent Memory
Aeon restructures LLM memory using the Atlas, Trace, Semantic Lookaside Buffer, Write Ahead Log, and Sidecar Blob Arena inside a zero copy Core Shell kernel. Aeon achieves 4.70 ns INT8 dot products, 3.09 µs Atlas traversal at 100K nodes, 3.1× compression, and P99 read latency of 750 ns under 16 thread contention compared to FP32 and flat scan baselines.
BenchmarkAgent Memory
Yi Yu, Liuyi Yao et al.
arXiv 2026 · 2026
Agentic Memory (AgeMem) exposes memory management tools, a three-stage progressive RL strategy, and step-wise GRPO directly inside the agent policy to jointly control long-term and short-term memory. On Qwen3-4B-Instruct, AgeMem attains 54.31% average performance across ALFWorld, SciWorld, PDDL, BabyAI, and HotpotQA, exceeding the best baseline A-Mem at 45.74%.
BenchmarkAgent Memory
Yakov Pyotr Shkolnikov
· 2026
Agent Memory Below the Prompt stores each agent’s KV state in a block pool, quantizes it via a Q4 pipeline, reloads it with BatchQuantizedKVCache, and reuses it across phases using cross-phase context injection. On Gemma 3 12B, Agent Memory Below the Prompt reduces cold TTFT from 172,096 ms to 1,264 ms at 32K context (136×) compared to FP16 prefix caching baselines like vllm-mlx.
BenchmarkBenchmarkAgent Memory
Ruoyao Wen, Hao Li et al.
· 2026
AGENTSYS organizes a Main Agent, Worker Agents, Intent Schemas, and an Alignment Validator into a hierarchical memory system that isolates raw tool outputs and only admits schema-validated JSON. On AgentDojo, AGENTSYS reaches 52.87% attacked utility and 0.78% ASR versus 48.27% and 30.66% for the No Defense baseline.
Cognitive ArchitectureAgent Memory
Bin Wen, Ruoxuan Zhang et al.
· 2026
Neuro-Symbolic Dual Memory Framework uses Progress Memory, Feasibility Memory, a Blueprint Planner Agent, a Progress Monitor Agent, and an Actor Agent to decouple semantic progress guidance from executable feasibility checks. On ALFWorld, Neuro-Symbolic Dual Memory Framework achieves 94.78% success rate versus 88.81% for AWM, and on WebShop reaches 0.7132 score versus 0.5998 for WALL-E 2.0.
Agent MemoryLong-Term Memory
Weiquan Huang, Zixuan Wang et al.
· 2026
AMA orchestrates four agents — the Constructor, Retriever, Judge, and Refresher — to build Raw Text, Fact Knowledge, and Episode Memory and route queries adaptively across these granularities. On the LoCoMo benchmark with GPT-4.1-mini, AMA achieves an overall LLM Score of 0.805 compared to Nemori’s 0.774, while reducing token consumption by approximately 80% relative to FullContext.
BenchmarkAgent Memory
Cheng Jiayang, Dongyu Ru et al.
· 2026
AMemGym combines Structured Data Generation, On-Policy Interaction, Evaluation Metrics, and Meta-Evaluation to script user state trajectories, drive LLM-simulated role-play, and score write–read–utilization behavior. On AMemGym’s base configuration, AWE-(2,4,30) reaches a 0.291 normalized memory score on interactive evaluation, while native gpt-4.1-mini only achieves 0.203, exposing substantial gaps between memory agents and plain long-context LLMs.
Agent MemoryLong-Term Memory
AMV-L manages agent memory using a Memory Value Model, Tiered Lifecycle, Bounded Retrieval Path, and Lifecycle Manager to decouple retention from retrieval eligibility. Under a 70k-request long-running workload, AMV-L improves throughput from 9.027 to 36.977 req/s over TTL and reduces p99 latency from 5398.167 ms to 1233.430 ms while matching LRU’s retrieval quality.
SurveyAgent Memory
Dongming Jiang, Yi Li et al.
arXiv 2026 · 2026
Anatomy of Agentic Memory organizes agentic memory into four structures using components like Lightweight Semantic Memory, Entity-Centric and Personalized Memory, Episodic and Reflective Memory, and Structured and Hierarchical Memory. Anatomy of Agentic Memory then reports comparative results such as Nemori’s 0.781 semantic judge score on LoCoMo versus SimpleMem’s 0.298, and latency differences like 1.129s for Nemori versus 32.372s for MemoryOS.
SurveyBenchmarkAgent MemoryLong-Term MemoryMemory Architecture
Zehao Lin, Chunyu Li, Kai Chen
· 2026
Mnemonic Sovereignty analyzes long term Write, Store, Retrieve, Execute, Share, and Forget Rollback phases against integrity, confidentiality, availability, and governance objectives for agent memory. Mnemonic Sovereignty’s lifecycle matrix shows most of the ~70 works cluster on write and retrieve integrity, leaving store, availability, and governance primitives like write gate validation and post deletion verification almost entirely unexplored.
BenchmarkAgent Memory
Samuel Sameer Tanguturi
· 2026
ATANT v1.1 structurally analyzes seven benchmarks using the 7 v1.0 continuity properties, the 10 checkpoints, a property-coverage matrix, and the Kenotic v1.0 reference implementation. ATANT v1.1 reports 96% ATANT cumulative-scale versus 8.8% LOCOMO substring accuracy, showing that LOCOMO, LongMemEval, BEAM, MemoryBench, Zep eval, MemGPT/Letta, and RULER measure different properties from continuity.
Agent Memory
Yupeng Huo, Yaxi Lu et al.
· 2026
AtomMem reframes agent memory as a POMDP and composes atomic CRUD operations over a hybrid scratchpad plus vector memory storage using a GRPO-based RL policy. On HotpotQA, 2WikiMultiHopQA, Musique, GAIA, and WebWalkerQA, AtomMem reaches an average 58.8 score, beating MemAgent’s 56.7 with the same Qwen3-8B backbone.
BenchmarkAgent MemoryLong-Term Memory
Zexue He, Yu Wang et al.
· 2026
MEMORYARENA orchestrates Memory-Agent-Environment Loops, Multi-Session Working Flow, Bundled Web Shopping, Group Travel Planning, and Progressive Web Search to stress-test how agents store and reuse information across sessions. MEMORYARENA’s main result is that agents with near-saturated scores on long-context benchmarks like LoCoMo still obtain Task Success Rates as low as 0.00–0.12 across its four environments.
BenchmarkAgent Memory
Mofasshara Rafique, Laurent Bindschaedler
· 2026
ClawVM manages agent state as typed pages via the SessionPageTable, RepresentationSelector, FaultObserver, WritebackJournal, and ClawVMEngine inside the agent harness. Across four OpenClaw-derived workloads and six token budgets, ClawVM cuts explicit faults from 67.8 (retrieval baseline) and 1.5 (Compaction-Hybrid) to 0.0 while adding median <50 μs policy-engine overhead per turn.
Cognitive ArchitectureAgent Memory
Zhixing You, Jiachen Yuan, Jason Cai
· 2026
D-Mem combines Mem0∗, Quality Gating, and Full Deliberation into a dual-process memory system that incrementally stores vector memories and selectively scans raw history. On LoCoMo with GPT-4o-mini, D-Mem’s Quality Gating reaches 53.5 F1 versus the Mem0∗ baseline’s 51.2 F1, recovering 96.7% of the 55.3 F1 Full Deliberation performance with far fewer tokens.
BenchmarkAgent MemoryLong-Term Memory
Benjamin Stern, Peter Nadel
· 2026
Drawing on Memory uses dual-trace memory encoding, an evidence scoring gate, and a three-state retrieval protocol to store paired fact and scene traces in Letta’s archival memory. On LongMemEval-S, Drawing on Memory reaches 73.7% accuracy versus 53.5% for the fact-only C7-control baseline, a +20.2 percentage point gain concentrated in temporal, update, and multi-session questions.
BenchmarkAgent Memory
Xing Zhang, Guanghui Wang et al.
· 2026
Experience Compression Spectrum organizes Level 0 Raw Trace, Level 1 Episodic Memory, Level 2 Procedural Skill, and Level 3 Declarative Rule into a unified scaffold-level compression framework. Experience Compression Spectrum’s mapping of 20+ systems and <1% cross-citation rate shows that all existing agents fix a single compression level and never perform adaptive cross-level compression.
BenchmarkAgent MemoryMemory Architecture
Zhaofen Wu, Hanrong Zhang et al.
· 2026
GAM builds a Hierarchical Graph Memory Architecture with a global Topic Associative Network, local Event Progression Graphs, State-Based Memory Consolidation, and Graph-Guided Multi-Factor Retrieval to decouple encoding from consolidation. On LoCoMo with Qwen2.5-7B, GAM attains an Average F1 of 40.00 compared to Mem0’s 35.38, and on LongDialQA with Qwen2.5-7B, GAM reaches 12.55 F1 vs MemoryOS at 6.76.
BenchmarkBenchmarkAgent MemoryLong-Term Memory
Chingkwun Lam, Jiaxin Li et al.
· 2026
SSGM interposes a Governance Middleware, Read Filtering Gate, Write Validation Gate, and a dual substrate of Mutable Active Graph plus Immutable Episodic Log between agents and memory. SSGM unifies evolving-memory systems into a four-dimensional failure taxonomy and proves that periodic reconciliation can bound semantic drift over infinite horizons.
BenchmarkAgent MemoryLong-Term MemoryMemory Architecture
Jiaquan Zhang, Chaoning Zhang et al.
· 2026
LightMem orchestrates SLM-1 Controller, SLM-2 Selector, SLM-3 Writer, and STM MTM LTM stores to modularize retrieval, writing, and offline consolidation. On LoCoMo, LightMem reaches 34.50 F1 for GPT-4o multi hop questions, +1.64 over A-MEM, while keeping median retrieval latency at 83 ms.
Agent Memory
Dongming Jiang, Yi Li et al.
· 2026
MAGMA organizes agent memory with an Intent-Aware Router, Adaptive Topological Retrieval, a Data Structure Layer of Relation Graphs and Vector Database, plus dual-stream Synaptic Ingestion and Asynchronous Consolidation. On LoCoMo, MAGMA achieves a 0.700 overall LLM-as-a-Judge score versus 0.590 for Nemori, and reaches 61.2% average accuracy on LongMemEval versus 56.2% for Nemori.
BenchmarkBenchmarkBenchmarkAgent MemoryLong-Term Memory
Weiwei Xie, Shaoxiong Guo et al.
· 2026
MemEvoBench combines Misleading Memory Injection, Noisy Tool Returns, Biased User Feedback, and a Memory Modification Tool (+ModTool) to stress-test long-term memory safety in LLM agents across 7 domains and 36 risk types. On the QA Style benchmark, MemEvoBench shows Gemini-2.5-Pro’s ASR drops from 67.0% (Vanilla) to 19.0% with +ModTool in Round 1, while biased feedback can push GPT-5’s QA ASR from 59.0% to 78.0% by Round 3.
Agent Memory
Zhenting Wang, Huancheng Chen et al.
· 2026
Memex(RL) optimizes Indexed Experience Memory, CompressExperience, ReadExperience, and ContextStatus so Memex keeps only an indexed summary in-context while archiving full artifacts externally. On modified ALFWorld, Memex(RL) lifts task success from 24.22% to 85.61% over the Memex agent without RL while reducing peak working context from 16,934.46 to 9,634.47 tokens.
Agent MemoryMemory Architecture
Ziliang Guo, Ziheng Li et al.
· 2026
MemFactory decomposes memory agents into Module Layer, Agent Layer, Environment Layer, and Trainer Layer with plug and play Extractor, Updater, Retriever, and RecurrentMemoryModule components. On MemAgent eval_50, MemFactory raises Qwen3-1.7B from 0.4727 to 0.5684 and Qwen3-4B-Instruct from 0.6523 to 0.7051 using GRPO.
RAGAgent MemoryLong-Term MemoryMemory Architecture
Memory as Metabolism defines companion knowledge systems with five retention operations (TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, AUDIT) plus memory gravity and minority-hypothesis retention over a raw buffer, active wiki, and cold memory. Instead of benchmark gains, Memory as Metabolism’s main result is a governance specification that separates descriptive, taxonomic, and normative claims and predicts improved coherence stability, fragility resistance, monoculture resistance, and effective minority-hypothesis influence for companion wikis.
BenchmarkBenchmarkAgent Memory
Weizhi Zhang, Xiaokai Wei et al.
· 2026
MEMORYCD builds a user memory pool Mu from lifelong Amazon Review histories and evaluates long-context prompting, Mem0, LoCoMo, ReadAgent, MemoryBank, and A-Mem across rating, ranking, and personalized text tasks. On Books and Home & Kitchen, MEMORYCD shows GPT-5 reaches RMSE 0.551–0.624 and NDCG@3 up to 0.610, while Gemini-2.5 Pro peaks at ROUGE-L 0.222 for generation, revealing substantial remaining gaps to real user behavior.
SurveyRAGAgent Memory
Memory for Autonomous LLM Agents decomposes agent memory into a POMDP-grounded write–manage–read loop, a three-dimensional taxonomy, and five mechanism families spanning context compression, retrieval stores, reflection, hierarchical virtual context, and policy-learned management. Memory for Autonomous LLM Agents synthesizes results like Voyager’s 15.3× tech-tree speedup and MemoryArena’s 80%→45% drop to show that memory architecture often matters more than backbone choice.
Agent Memory
Yanchen Wu, Tenghui Lin et al.
· 2026
Memory in the LLM Era decomposes agent memory into Information Extraction, Memory Management, Memory Storage, and Information Retrieval, then recombines modules into a new hierarchical tree–tier architecture. On LONGMEMEVAL with Qwen2.5-7B, Memory in the LLM Era achieves 38.79 F1 overall versus 36.92 for MemTree.
Agent Memory
Zhongming Yu, Naicheng Yu et al.
arXiv 2026 · 2026
Multi-Agent Memory Architecture organizes Agent IO Layer, Agent Cache Layer, Agent Memory Layer, Agent Cache Sharing, and Agent Memory Access Protocol into a computer-architecture-style design for LLM agents. Multi-Agent Memory Architecture’s main result is a conceptual unification of shared and distributed memory plus a research agenda for multi-agent memory consistency instead of benchmark gains.
Agent Memory
Wujiang Xu, Zujie Liang et al.
· 2025
A-MEM organizes agent memory via Note Construction, Link Generation, Memory Evolution, and Retrieve Relative Memory to build an evolving, interconnected note graph. On the LoCoMo dataset, A-MEM with GPT-4o-mini reaches 27.02 F1 on Multi Hop questions, +17.87 over ReadAgent, while cutting average token length from 16,910 to 2,520.
Agent Memory
Qianshan Wei, Tengchao Yang et al.
· 2025
A-MemGuard combines consensus-based validation, dual-memory structure, lesson memory, and path divergence scoring to sanitize retrieved memories and revise actions using past failures. On EHRAgent under AgentPoison, A-MemGuard reduces ASR-r from 100.0% to 2.13% and ASR-t from 100.0% to 6.38%, far below LLM Auditor and Distil Classifier.
Agent Memory
Rui Li, Zeyu Zhang et al.
· 2025
CAM builds hierarchical schemata using an incremental overlapping clustering algorithm, ego centric disentanglement, and a Prune and Grow associative strategy for retrieval. On NovelQA, CAM achieves 52.3 ACC-L versus RAPTOR’s 47.8, a +4.5 point gain while also improving efficiency in long-text reading comprehension.
RAGBenchmarkBenchmarkBenchmarkAgent MemoryLong-Term MemoryMemory Architecture
Alessandra Terranova, Björn Ross, Alexandra Birch
· 2025
Evaluating Long-Term Memory for Long-Context Question Answering compares Full Context, RAG, A-Mem, RAG+PromptOpt, and RAG+EpMem memory components across semantic, episodic, and procedural memory for long conversational QA. On LoCoMo, RAG+EpMem reaches an average F1 ranking of 1.83 for Llama 3.2-3B Instruct and 1.80 for GPT-4o mini while using around 1,000 tokens per query versus over 23,000 for Full Context.
Agent Memory
Yuanzhe Hu, Yu Wang, Julian McAuley
ICLR 2026 · 2025
MemoryAgentBench standardizes multi-turn datasets into chunked conversations with memorization prompts, then evaluates long-context agents, RAG agents, and agentic memory agents across Accurate Retrieval, Test-Time Learning, Long-Range Understanding, and Selective Forgetting. On the overall score in Table 3, the GPT-4.1-mini long-context agent reaches 71.8 on Accurate Retrieval tasks compared to 49.2 for the GPT-4o-mini long-context baseline.
BenchmarkBenchmarkAgent MemoryMemory Architecture
MaRS organizes agent memory into episodic, semantic, social, and task nodes with provenance, scored by a privacy-aware retention controller and governed by FIFO, LRU, Priority Decay, Reflection-Summary, Random-Drop, and Hybrid policies. On the FiFA benchmark, the Hybrid policy in MaRS achieves a composite score of ≈0.911 across 300 runs and five memory budgets, outperforming simpler policies while preserving privacy and cost efficiency.
Agent Memory
B.Y. Yan, Chaofan Li et al.
arXiv 2025 · 2025
General Agentic Memory (GAM) combines a Memorizer, Researcher, page-store, and memory to keep full trajectories while constructing lightweight guidance for deep research. On RULER 128K retrieval, GAM achieves 97.70% accuracy compared to 94.25% for RAG using GPT-4o-mini, while also reaching 64.07 F1 on HotpotQA-56K.
Agent MemoryMemory Architecture
Chris Latimer, Nicoló Boschi et al.
· 2025
HINDSIGHT organizes agent memory into four networks via TEMPR and layers CARA on top to retain, recall, and reflect with explicit opinions and behavioral profiles. On LongMemEval, HINDSIGHT with Gemini-3 Pro scores 91.4% overall versus 60.2% for full-context GPT-4o, while HINDSIGHT with OSS-20B jumps from 39.0% to 83.6% over a full-context OSS-20B baseline.
Agent Memory
Tejas Pawar, Sarika Patil et al.
· 2025
IMDMR combines a Memory Storage Layer, Multi-Dimensional Search Engine, Intelligent Query Processor, and Response Generation Module to retrieve conversational memories across semantic, entity, category, intent, context, and temporal dimensions. On the synthetic 1,000 conversation benchmark, IMDMR-Prod achieves an overall score of 0.792 compared to 0.207 for spaCy + RAG, a 3.8x improvement.
Agent MemoryLong-Term MemoryMemory Architecture
Zhengjun Huang, Zhoujin Tian et al.
· 2025
LiCoMemory organizes long term dialogue with CogniGraph, Query Processing and Integrated Rerank, and Real Time Interactions to keep session summaries, triples, and chunks linked. On LongMemEval with GPT-4o-mini, LiCoMemory reaches 73.80% accuracy and 76.63% recall, beating Mem0g by 9.0 and 7.1 points.
Agent Memory
Haoran Tan, Zeyu Zhang et al.
ACL 2025 · 2025
MemBench evaluates LLM-based agents with multi-scenario datasets, multi-level memory content, and a time-aware benchmark using components like Multi-scenario Dataset, Multi-level Memory Content, and Multi-metric Evaluation. MemBench shows that mechanisms such as GenerativeAgent, MemGPT, MemoryBank, and SCMemory can drop from accuracies around 0.7 on 10k-token settings to roughly 0.3–0.4 at 100k tokens, exposing clear capacity limits.
BenchmarkAgent MemoryMemory Architecture
Guibin Zhang, Haotian Ren et al.
· 2025
MemEvolve decomposes agent memory into Encode, Store, Retrieve, and Manage modules and meta evolves these components via a dual evolution process over candidate architectures. On xBench DeepSearch, MemEvolve with GPT 5 mini raises Flash Searcher pass@1 from 69.0 to 74.0 and WebWalkerQA accuracy from 58.82 to 61.18 while keeping API cost near 0.141 per query.
BenchmarkBenchmarkAgent Memory
Samarth Sarin, Lovepreet Singh et al.
· 2025
Memoria augments LLM chats with structured conversation logging, dynamic user persona via KG, session level memory for real time context, and seamless retrieval for context aware responses to provide persistent, interpretable memory. On LongMemEvals single-session-user and knowledge-update subsets, Memoria reaches 87.1% and 80.8% accuracy respectively, surpassing A-Mem (OpenAI) while using much shorter prompts.
BenchmarkAgent Memory
MemoriesDB stores each Memory Record, Edges and Relations, and the Temporal Semantic Stack inside PostgreSQL with pgvector, exposing unified temporal–semantic–relational queries. MemoriesDB’s main result is a working implementation that demonstrates scalable time-bounded recall and hybrid semantic–structural queries on commodity SQL infrastructure without specialized vector or graph engines.
RAGBenchmarkAgent Memory
Yuyang Hu, Shichun Liu et al.
· 2025
Memory in the Age of AI Agents formalizes agent memory with Memory Formation, Memory Evolution, and Memory Retrieval operators, and classifies memories into token-level, parametric, and latent forms plus factual, experiential, and working functions. Memory in the Age of AI Agents’ main result is a unified Forms–Functions–Dynamics framework that consolidates fragmented LLM agent memory work, benchmarks, and open-source frameworks into a coherent taxonomy.
BenchmarkAgent Memory
Bowen Jiang, Yuan Yuan et al.
· 2025
PersonaMem-v2 combines PERSONAMEM-V2: IMPLICIT PERSONAS, RL with Long-Context Reasoning, RL with Agentic Memory, and a User Privacy-Aware Design to train Qwen3-4B with GRPO on implicit user preferences from long, noisy histories. PersonaMem-v2 achieves 55.2% MCQ and 60.7% open-ended accuracy on PERSONAMEM-V2, surpassing GPT-5-Chat’s 45.6% and 46.2% while using a 2k-token agentic memory instead of full 32k–128k contexts.
RAGBenchmarkAgent MemoryMemory Architecture
Maitreyi Chatterjee, Devansh Agarwal
· 2025
Semantic Anchoring enriches conversational memory by combining a hybrid memory store with dense and symbolic indexes, structured memory representation tuples, hybrid storage and indexing, and a retrieval scoring method. On MultiWOZ-Long, Semantic Anchoring reaches 83.5% Factual Recall and 80.8% Discourse Coherence, beating Entity-RAG by 7.6 and 8.6 points respectively.
Agent Memory
Bo Wang, Weiyi He et al.
· 2025
MEXTRA crafts black box attacking prompts and automated diverse prompt generators that target the memory module, similarity scoring function, retrieval depth, memory size, and LLM backbone. MEXTRA extracts 50 queries from a 200 record EHRAgent memory and 26 from RAP, with extracted efficiency up to 0.42 compared to weaker baselines without workflow aligned prompts.
Agent MemoryMemory Architecture
Jiali Cheng, Anjishnu Kumar et al.
· 2025
WebATLAS combines a Planner, Actor, Critic, and Multi-layered Memory (Working Memory, Cognitive Map, Semantic Memory) to simulate and score actions before executing them on the web. On WebArena-Lite, WebATLAS achieves 63.0% average success versus 53.9% for Plan-and-Act, a +9.1 point gain without website-specific fine-tuning.