Deciphering the Interplay of Parametric and Non-parametric Memory in Retrieval-augmented Language Models

AuthorsMehrdad Farahani, Richard Johansson

2024

TL;DR

Deciphering the Interplay of Parametric and Non-parametric Memory uses causal mediation on ATLAS to show non-parametric copying dominates parametric recall when contexts conflict.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

RAG Copies Wrong Facts When Context Conflicts With Parametric Memory

Deciphering the Interplay of Parametric and Non-parametric Memory shows statistically different behavior between parametric and non-parametric subsets (p-value=1.60e-4, Cohen’s d=-0.9851), with non-parametric answers highly variable.

When ATLAS receives conflicting contexts like “Milan became the official capital of Sweden,” Deciphering the Interplay of Parametric and Non-parametric Memory finds ATLAS often copies the counterfactual with high probability, degrading factual reliability.

HOW IT WORKS

Causal Mediation Analysis Inside ATLAS

Deciphering the Interplay of Parametric and Non-parametric Memory combines Experiment 1, Experiment 2, causal mediation analysis, and Path Specific Effects (PSE) to isolate copying versus recall inside ATLAS.

Conceptually, Deciphering the Interplay of Parametric and Non-parametric Memory treats ATLAS like a computer where parametric weights are long-term storage and retrieved passages are a volatile clipboard that can overwrite answers.

This KEY_MECHANISM of token-level mediation tracing lets Deciphering the Interplay of Parametric and Non-parametric Memory expose when ATLAS chooses context copying over parametric recall, beyond what a plain context window reveals.

DIAGRAM

Token Level Mediation Flow During Copying and Relevance Decisions

This diagram shows how Deciphering the Interplay of Parametric and Non-parametric Memory runs corrupted and restoration passes through ATLAS layers to compute indirect effects for specific tokens.

DIAGRAM

Evaluation Pipeline Across PopQA and PEQ

This diagram shows how Deciphering the Interplay of Parametric and Non-parametric Memory constructs synthetic contexts, filters questions, and groups behaviors into parametric versus non-parametric subsets.

PROCESS

How Deciphering the Interplay of Parametric and Non-parametric Memory Handles a Question Answering Query

01
Data Preparation
Deciphering the Interplay of Parametric and Non-parametric Memory uses PopQA and PrincetonEntityQuestion to build synthetic contexts from factual triples and query templates.
02
Experiment 1
Deciphering the Interplay of Parametric and Non-parametric Memory replaces object tokens with counterfactuals to measure copying via total and indirect effects.
03
Experiment 2
Deciphering the Interplay of Parametric and Non-parametric Memory injects noise into subject and relation embeddings to study context relevance decisions.
04
Path Specific Effects
Deciphering the Interplay of Parametric and Non-parametric Memory severs MLP and Attention paths to attribute copying and relevance to specific transformer modules.

KEY CONTRIBUTIONS

Key Contributions

01
Examine how ATLAS uses different types of memory
Deciphering the Interplay of Parametric and Non-parametric Memory applies causal mediation analysis to ATLAS to separate parametric recall from non-parametric copying at token and layer level.
02
Show when ATLAS prefers one memory type
Deciphering the Interplay of Parametric and Non-parametric Memory demonstrates that ATLAS behavior resembles the non-parametric subset, with TE distributions shifting strongly toward counterfactuals when contexts change.
03
Identify parts crucial for copying and relevance
Deciphering the Interplay of Parametric and Non-parametric Memory finds mid layer MLP blocks and object tokens dominate copying, while subject and relation tokens drive context relevance.

RESULTS

By the Numbers

TE parametric vs non parametric

p-value=1.60e-4

Cohen’s d=-0.9851 between parametric and non_parametric subsets

Effect size

-0.9851

vs parametric subset in TE distribution

Subject vs relation TE

p-value=3.57e-3

Cohen’s d=-6.87e-2 between subject and relation tokens

Subject relation effect size

-6.87e-2

shows similar contribution of subjects and relations to relevance

Deciphering the Interplay of Parametric and Non-parametric Memory evaluates ATLAS on PopQA and PrincetonEntityQuestion synthetic contexts, focusing on TE, AIE, and PSE rather than accuracy. These statistics show that non-parametric copying dominates behavior and that subject and relation tokens contribute comparably to relevance decisions.

BENCHMARK

By the Numbers

BENCHMARK

TE Distribution Comparison Between Behaviors and Token Types

Relative strength of total effects (TE) and effect sizes for parametric vs non_parametric behavior and subject vs relation relevance.

KEY INSIGHT

The Counterintuitive Finding

Deciphering the Interplay of Parametric and Non-parametric Memory shows non-parametric behavior has Cohen’s d=-0.9851 versus parametric behavior, indicating strong copying from altered contexts.

This is surprising because ATLAS is a knowledge intensive RAG system, yet Deciphering the Interplay of Parametric and Non-parametric Memory finds external context can override memorized facts even with obvious counterfactuals.

WHY IT MATTERS

What this unlocks for the field

Deciphering the Interplay of Parametric and Non-parametric Memory gives a concrete recipe to localize copying and relevance mechanisms in RAG models using mediation and PSE.

Armed with these insights, builders can now design ATLAS like systems that explicitly modulate copying versus recall by targeting mid layer MLPs, subject tokens, relation tokens, and object token pathways.

~11 min read← Back to papers

Related papers

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

RAG

A Dynamic Retrieval-Augmented Generation System with Selective Memory and Remembrance

Okan Bursa

· 2026

Adaptive RAG Memory (ARM) augments a standard retriever–generator stack with a Dynamic Embedding Layer and Remembrance Engine that track usage statistics and apply selective remembrance and decay to embeddings. On a lightweight retrieval benchmark, ARM achieves NDCG@5 ≈ 0.9401 and Recall@5 = 1.000 with 22M parameters, matching larger baselines like gte-small while providing the best efficiency among ultra-efficient models.

arXiv:2601.02428 Read explainer

RAGLong-Term Memory

HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues

Yijie Zhong, Yunfan Gao, Haofen Wang

· 2026

HingeMem combines Boundary Guided Long-Term Memory, Dialogue Boundary Extraction, Memory Construction, Query Adaptive Retrieval, Hyperedge Rerank, and Adaptive Stop to segment dialogues into element-indexed hyperedges and plan query-specific retrieval. On LOCOMO, HingeMem achieves 63.9 overall F1 and 75.1 LLM-as-a-Judge score, surpassing the best baseline Zep (56.9 F1) by 7.0 F1 without using category-specific QA formats.

arXiv:2604.06845 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…