APEX-MEM: Agentic Semi-Structured Memory with Temporal Reasoning for Long-Term Conversational AI

AuthorsPratyay Banerjee, Masud Moshtaghi, Shivashankar Subramanian et al.

2026

TL;DR

APEX-MEM uses an append-only temporal property graph plus multi-tool Graph QnA agents to reach 88.88% accuracy on LOCOMO, +3.50 points over MIRIX.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Long-context agents add noise and collapse under extended history (51.6% to 15.7% F1)

LLMs with larger context windows still fail on long conversations: GPT-4-Turbo drops from 51.6% F1 to 15.7% F1 under adversarial noise.

This breakdown hurts long-term conversational memory, causing inconsistent entities, broken temporal coherence, and unreliable answers across multi-session dialogues.

HOW IT WORKS

APEX-MEM — Property graphs, append-only events, and graph agents

APEX-MEM combines Ontology, Entity and Property Resolution, Fact Extraction, and Graph Agents into a temporal property graph that stores evolving conversational facts.

Think of APEX-MEM as a card catalog plus timeline: entities are cards, events are dated entries, and Graph Agents are librarians using specialized tools.

This design lets APEX-MEM resolve conflicts at query time, track temporal validity, and answer complex questions that a plain context window cannot handle.

DIAGRAM

APEX-MEM Graph QnA Agent Tool-use Sequence

This diagram shows how the APEX-MEM Graph QnA agent uses SCHEMAVIEWER, ENTITYLOOKUP, GRAPHSQL, and SEARCH tools to answer a question over the property graph.

DIAGRAM

Evaluation Pipeline across LOCOMO, LongMemEval, and SealQA-Hard

This diagram shows how APEX-MEM is constructed and evaluated on LOCOMO, LongMemEval, and SealQA-Hard with different QnA agents and baselines.

PROCESS

How APEX-MEM Handles a Conversational Question

  1. 01

    APEX-MEM Graph Construction

    APEX-MEM uses Fact Extraction and Entity and Property Resolution to build an append-only temporal property graph from conversational turns.

  2. 02

    Ontology and Fact Extraction

    APEX-MEM applies the Ontology during Fact Extraction to type entities, events, and subject property value assertions with temporal validity intervals.

  3. 03

    Graph Agents with Tools

    APEX-MEM Graph Agents invoke SCHEMAVIEWER, ENTITYLOOKUP, GRAPHSQL, and SEARCH to plan retrieval and reasoning over the property graph.

  4. 04

    Retrieval Time Temporal Resolution

    APEX-MEM resolves conflicting facts at query time using GRAPHSQL over events and facts to compute temporally valid answers for the user.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Hybrid entity event ontology for conversational memory

    APEX-MEM introduces an Ontology with 35 entity classes and temporally grounded events, enabling Fact Extraction to attach subject property value assertions with validity intervals.

  • 02

    Append only event storage with temporal validity

    APEX-MEM stores all facts as append-only events instead of overwriting entities, allowing Graph Agents to perform retrieval time temporal resolution over evolving information.

  • 03

    Multi tool Graph QnA agent over property graph

    APEX-MEM Graph Agents combine SCHEMAVIEWER, ENTITYLOOKUP, GRAPHSQL, and SEARCH, reaching 88.88% accuracy on LOCOMO and 86.2% on LongMemEval.

RESULTS

By the Numbers

Overall accuracy LOCOMO

88.88%

+3.50 over MIRIX

Temporal accuracy LOCOMO

90.63%

vs MIRIX 65.62% temporal

Overall score LongMemEval

86.2%

+11.6 over Nemori 74.6%

Accuracy SealQA Hard

40.1%

+5.5 over O3 34.6%

On LOCOMO and LongMemEval, which test long term conversational memory and long context reasoning, APEX-MEM’s 88.88% and 86.2% scores show robust temporal and multi hop reasoning over extended histories.

BENCHMARK

By the Numbers

On LOCOMO and LongMemEval, which test long term conversational memory and long context reasoning, APEX-MEM’s 88.88% and 86.2% scores show robust temporal and multi hop reasoning over extended histories.

BENCHMARK

LOCOMO Category Type Evaluation Results

Overall accuracy on LOCOMO Question Answering benchmark.

BENCHMARK

APEX-MEM Ablations of different tools

Overall LOCOMO accuracy for APEX-MEM Graph QnA Agent with different tool subsets.

KEY INSIGHT

The Counterintuitive Finding

APEX-MEM with full tools reaches 87.0% on LOCOMO, while GraphSQL only configuration needs 3.3x more tool calls for just 79.45%.

This is surprising because many expect more structured SQL reasoning alone to be enough, but APEX-MEM shows hybrid SEARCH plus GRAPHSQL is both more accurate and more efficient.

WHY IT MATTERS

What this unlocks for the field

APEX-MEM unlocks temporally coherent, entity consistent conversational memory that can resolve conflicting facts at query time instead of overwriting history.

Builders can now create assistants that survive weeks long, noisy interactions while still answering temporal and multi hop questions with over 88% accuracy on challenging benchmarks.

~14 min read← Back to papers

Related papers

Memory Architecture

Breaking the KV Cache Bottleneck: Fan Duality Model Achieves O(1) Decode Memory with Superior Associative Recall

Yasong Fan

· 2026

Fan Duality Model (FDM) uses the Fan Operator, Local-Global Cache, Freeze-Scan Training, and Holographic Reference Beam Decoding to separate wave-like compression from particle-like associative recall. On WikiText-103, Fan Duality Model (FDM) reaches 64.9 perplexity with Freeze-Scan and 62.79 with holographic decoding, while achieving 0.966 MQAR accuracy compared to Transformer at 0.606.

Memory Architecture

Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP

Martin Vogel, Falk Meyer-Eschenbach et al.

· 2026

Codebase-Memory parses repositories with a multi-pass pipeline using the Parse stage, Build stage, Serve stage, FunctionRegistry, Louvain communities, and MCP tool interface to build a persistent SQLite knowledge graph. On a 31-language benchmark, Codebase-Memory reaches 0.83 quality versus 0.92 for an Explorer Agent while using ten times fewer tokens and 2.1 times fewer tool calls.

Questions about this paper?

Paper: APEX-MEM: Agentic Semi-Structured Memory with Temporal Reasoning for Long-Term Conversational AI

Answers use this explainer on Memory Papers.

Checking…