Agent Memory Long-Term Memory Memory Architecture

LiCoMemory: Lightweight and Cognitive Agentic Memory for Efficient Long-Term Reasoning

AuthorsZhengjun Huang, Zhoujin Tian, Qintian Guo et al.

2025

TL;DR

LiCoMemory uses the hierarchical CogniGraph with hierarchy temporal rerank to reach 73.80% accuracy on LongMemEval, +9.0 points over Mem0g.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Graph based agents waste time and return scattered memories

Existing graph based memory like GraphRAG can require up to 20 minutes for graph construction and over 2 minutes of query latency per dialogue.

These heavy, flat graphs mix semantics and topology, causing redundant nodes, scattered retrieval, and degraded long term reasoning in conversational agents.

HOW IT WORKS

CogniGraph and hierarchy temporal aware retrieval

LiCoMemory centers on CogniGraph, Query Processing and Integrated Rerank, and Real Time Interactions to decouple storage from semantic indexing across three layers.

You can think of CogniGraph as a card catalog for memory, where summaries are shelves, triples are index cards, and chunks are the books on the back wall.

This design lets LiCoMemory navigate from high level summaries to timestamped triples and raw chunks, enabling temporally aware reasoning far beyond a plain context window.

DIAGRAM

Real time query and update flow in LiCoMemory

This diagram shows how LiCoMemory processes a user query, retrieves through CogniGraph, and then performs real time memory updates.

DIAGRAM

Evaluation pipeline and ablation design for LiCoMemory

This diagram shows how LiCoMemory is evaluated on LongMemEval and LoCoMo, including ablations of structured retrieval, temporal awareness, and summaries.

PROCESS

How LiCoMemory Handles a Long Term Dialogue Session

01
CogniGraph: A Lightweight and Semantically Aware Graph Structure
LiCoMemory builds CogniGraph with session level summaries, entity relation triples, and chunk level dialogue, linking them via unique identifiers for hierarchical indexing.
02
Query Processing and Integrated Rerank
LiCoMemory extracts entities, matches them to session summaries, retrieves triples, and computes Ssem and R(t) using the harmonic mean and Weibull based temporal decay.
03
Real Time Interactions
LiCoMemory retrieves evidence for the agent response, then incrementally updates session summaries, performs triple extraction and deduplication, and links new chunks.
04
Hierarchy and Temporally Sensitive Retrieval
LiCoMemory performs top down retrieval from summaries to triples to chunks, guided by hierarchical structure and temporal relevance to maintain coherent answers.

KEY CONTRIBUTIONS

Key Contributions

01
CogniGraph for semantic organization
LiCoMemory introduces CogniGraph with session level summaries, entity relation triples, and chunk level dialogue, decoupling storage from indexing and enabling lightweight updates.
02
Hierarchy and temporally sensitive retrieval
LiCoMemory combines session level relevance Ss, triple level relevance St, and a Weibull based decay w(Δτ) to compute R(t) for temporally aware retrieval.
03
Efficient and real time memory operations
LiCoMemory reduces construction latency on LoCoMo from 1772s in Mem0 to 21s and cuts construction tokens from 49.3k to 13.52k while improving accuracy.

RESULTS

By the Numbers

Acc.

73.80%

+9.0 over Mem0g

Rec.

76.63%

+7.10 over Mem0g

1.61s

0.14s faster than Mem0g on LongMemEval with GPT 4o mini

1.7k

1.2k fewer tokens than Zep on LongMemEval with GPT 4o mini

On LongMemEval, which tests single session, multi session, temporal reasoning, and knowledge update, LiCoMemory achieves 73.80% accuracy and 76.63% recall with GPT 4o mini. These gains show that LiCoMemory retrieves more relevant long term context while using fewer tokens and lower latency than Mem0g and MemOS.

BENCHMARK

By the Numbers

BENCHMARK

Evaluation on long term memory QA benchmarks utilizing GPT 4o mini

Acc. on LongMemEval with GPT 4o mini.

BENCHMARK

Real time interaction performance on LoCoMo

Accuracy on LoCoMo in real time interaction setting.

KEY INSIGHT

The Counterintuitive Finding

LiCoMemory cuts construction latency on LoCoMo from 1772s in Mem0 to 21s while increasing accuracy from 54.68% to 66.4%.

This is surprising because graph based memory is usually assumed to be slower than vector memory, yet CogniGraph shows a 84x speedup with better accuracy.

WHY IT MATTERS

What this unlocks for the field

LiCoMemory enables agents to maintain hierarchical, temporally aware memories that update in real time without heavy graph reconstruction.

Builders can now deploy long running assistants that answer temporal and multi session questions accurately while keeping token budgets and latency low enough for interactive use.

~14 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

arXiv:2601.07190 Read explainer

Agent Memory

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Xiaohui Zhang, Zequn Sun et al.

· 2026

ActMem transforms dialogue history into atomic facts via Memory Fact Extraction, groups them with Fact Clustering, links them through a Memory KG Construction module, and uses Counterfactual-based Retrieval and Reasoning for action-aware answers. On ActMemEval, ActMem reaches 76.52% QA accuracy with DeepSeek-V3, beating LightMem’s 63.97% by 12.55 points and NaiveRAG’s 61.54%.

arXiv:2603.00026 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…