RAG Benchmark Agent Memory Memory Architecture

Semantic Anchoring in Agentic Memory: Leveraging Linguistic Structures for Persistent Conversational Context

AuthorsMaitreyi Chatterjee, Devansh Agarwal

2025

TL;DR

Semantic Anchoring uses hybrid symbolic–neural memory with dependency parses, coreference chains, and discourse tags to raise Factual Recall to 83.5% on MultiWOZ-Long (+7.6 over Entity-RAG).

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Long term agents lose nuanced conversational context with dense only RAG

Semantic Anchoring addresses that typical RAG systems store dialogue history as dense vectors, which neglect syntactic dependencies, discourse relations, and coreference links.

When these linguistic structures are ignored, multi session assistants misinterpret paraphrases and implicit references, degrading factual recall and discourse coherence in persistent conversations.

HOW IT WORKS

Semantic Anchoring — hybrid symbolic and neural memory

Semantic Anchoring centers on memory representation, hybrid storage and indexing, retrieval scoring, and integration with the LLM to encode rich linguistic structure in memory entries.

You can think of Semantic Anchoring like a card catalog plus a search engine: dense embeddings act as fuzzy search, while symbolic indexes act as labeled drawers for entities, syntax, and discourse.

By anchoring retrieval in explicit linguistic structure, Semantic Anchoring recalls cross session references and discourse roles that a plain context window or pure vector RAG would miss.

DIAGRAM

Query time retrieval pipeline in Semantic Anchoring

This diagram shows how Semantic Anchoring processes a query and fuses dense and symbolic retrieval for memory selection.

DIAGRAM

Evaluation setup and ablation design for Semantic Anchoring

This diagram shows how Semantic Anchoring is evaluated on MultiWOZ Long and DialogRE L with baselines and ablations.

PROCESS

How Semantic Anchoring Handles a Multi session Query

01
Syntactic parsing
Semantic Anchoring runs syntactic parsing with a biaffine dependency parser to build Di, capturing head modifier relations and grammatical roles for each utterance.
02
Coreference resolution
Semantic Anchoring applies coreference resolution to produce entity clusters Ei with persistent IDs, unifying pronouns, nominal mentions, and named entities across dialogue turns.
03
Discourse tagging
Semantic Anchoring performs discourse tagging to assign Ci, labeling relations like Elaboration, Contrast, or Cause between utterances and prior turns.
04
Hybrid storage and retrieval scoring
Semantic Anchoring stores vi in the dense index and symbolic keys in the symbolic index, then computes score(Mi q) with tuned λs λe λc to select top k memories.

KEY CONTRIBUTIONS

Key Contributions

01
Hybrid agentic memory architecture
Semantic Anchoring introduces a hybrid agentic memory architecture that integrates dependency parses Di, discourse relations Ci, and coreference chains Ei into memory representation tuples Mi.
02
Retrieval scoring method
Semantic Anchoring proposes a retrieval scoring method score(Mi q) that combines cosine similarity sim(vi vq) with entity match and discourse match using tuned λs λe λc.
03
Extensive evaluation and ablations
Semantic Anchoring is evaluated on MultiWOZ Long and DialogRE L, achieving 83.5% FR and 80.8% DC on MultiWOZ Long, with ablations showing drops up to 11.9 points when symbolic features are removed.

RESULTS

By the Numbers

FR (%)

83.5%

+7.6 over Entity-RAG

DC (%)

80.8%

+8.6 over Entity-RAG

UCS (/5)

4.3

0.6 higher than Entity-RAG

Retrieval latency

120ms dense + 40ms symbolic

fusion adds 15ms on A100 hardware

On MultiWOZ Long, which stresses cross session recall of entities and facts, Semantic Anchoring improves Factual Recall from 75.9% to 83.5% and Discourse Coherence from 72.2% to 80.8%. These gains show that Semantic Anchoring leverages linguistic structure to maintain long range conversational context better than Entity-RAG and Vector RAG.

BENCHMARK

By the Numbers

BENCHMARK

Overall performance on MultiWOZ Long

FR (%) on MultiWOZ Long for Semantic Anchoring and baselines.

KEY INSIGHT

The Counterintuitive Finding

Semantic Anchoring shows that removing discourse tagging alone drops Factual Recall from 83.5% to 78.8%, a 4.7 point loss on MultiWOZ Long.

This is surprising because many practitioners assume discourse labels are secondary, yet Semantic Anchoring demonstrates they matter almost as much as coreference and dependency features for long term recall.

WHY IT MATTERS

What this unlocks for the field

Semantic Anchoring unlocks conversational agents that maintain entity continuity, syntactic nuance, and discourse roles across many sessions without relying solely on large context windows.

With Semantic Anchoring, builders can design memory systems that are both more interpretable and more robust, enabling user editable memories and symbolic debugging that were impractical with pure dense RAG.

~12 min read← Back to papers

Related papers

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

RAG

A Dynamic Retrieval-Augmented Generation System with Selective Memory and Remembrance

Okan Bursa

· 2026

Adaptive RAG Memory (ARM) augments a standard retriever–generator stack with a Dynamic Embedding Layer and Remembrance Engine that track usage statistics and apply selective remembrance and decay to embeddings. On a lightweight retrieval benchmark, ARM achieves NDCG@5 ≈ 0.9401 and Recall@5 = 1.000 with 22M parameters, matching larger baselines like gte-small while providing the best efficiency among ultra-efficient models.

arXiv:2601.02428 Read explainer

RAGLong-Term Memory

HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues

Yijie Zhong, Yunfan Gao, Haofen Wang

· 2026

HingeMem combines Boundary Guided Long-Term Memory, Dialogue Boundary Extraction, Memory Construction, Query Adaptive Retrieval, Hyperedge Rerank, and Adaptive Stop to segment dialogues into element-indexed hyperedges and plan query-specific retrieval. On LOCOMO, HingeMem achieves 63.9 overall F1 and 75.1 LLM-as-a-Judge score, surpassing the best baseline Zep (56.9 F1) by 7.0 F1 without using category-specific QA formats.

arXiv:2604.06845 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…