Memory-Augmented Log Analysis with Phi-4-mini: Enhancing Threat Detection in Structured Security Logs

AuthorsAnbi Guo, Mahfuza Farooque

2025

TL;DR

DM-RAG uses dual memories plus Bayesian fusion to push recall to 98.70% on UNSW-NB15 with Phi-4-mini.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

LLMs Miss Multistage Threats in Long Logs (DM-RAG needs 98.70% recall)

Structured security logs for APT detection span long periods, but DM-RAG’s target domain exceeds typical LLM context windows and domain priors, causing missed attacks.

When DM-RAG is not used, Phi-4-mini-based systems struggle on UNSW-NB15, leading to low recall and undetected multistage attacks, harming intrusion detection and threat response.

HOW IT WORKS

DM-RAG — Dual-Memory Retrieval-Augmented Generation

DM-RAG’s core mechanism combines Short-Term Memory (STM), Long-Term Memory (LTM), logistic regression confidence model, and Bayesian fusion around an instruction-tuned Phi-4-mini.

You can think of STM as RAM and LTM as disk, with FAISS acting like an indexed card catalog for historical attack patterns.

This dual-memory design lets DM-RAG reason over days of logs, combining recent and historical context in ways a plain context window cannot.

DIAGRAM

Online Log Analysis and Memory Update Flow

This diagram shows how DM-RAG processes each UNSW-NB15 log entry, updates STM and LTM, and applies Bayesian fusion when STM is full.

DIAGRAM

UNSW-NB15 Evaluation Pipeline for DM-RAG

This diagram shows how DM-RAG is trained and evaluated on UNSW-NB15 with logistic regression, instruction tuning, and test-time dual-memory prompting.

PROCESS

How DM-RAG Handles Online Log Analysis with Memory-Augmented RAG and Bayesian Fusion

  1. 01

    Step 1: Confidence Model Preparation

    DM-RAG trains the logistic regression confidence model on normalized UNSW-NB15 features, defining score(x) as P(y = 1 | x) for anomaly confidence.

  2. 02

    Step 2: Online Log Analysis

    DM-RAG encodes each log with the Encoder E, retrieves from Long-Term Memory (LTM) via FAISS, merges STM and LTM into a prompt, and queries Phi-4-mini.

  3. 03

    Step 3: Memory Generation

    DM-RAG parses summary, confidence, and label, then appends them into Short-Term Memory (STM) as a sliding window of K = 10 recent summaries.

  4. 04

    Step 4: Memory Compression and Promotion

    When STM is full, DM-RAG compresses summaries with Phi-4-mini, applies Bayesian fusion to compute conffused, and promotes high-confidence summaries into Long-Term Memory (LTM).

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Dual-Memory Retrieval-Augmented Generation (DM-RAG)

    DM-RAG introduces interacting Short-Term Memory (STM) and Long-Term Memory (LTM) modules around Phi-4-mini, enabling high-recall log anomaly detection with 98.70% recall and 69.59% F1 on UNSW-NB15.

  • 02

    Bayesian Fusion for Memory Promotion

    DM-RAG uses Bayesian fusion over Beta-modeled features to compute fused anomaly confidence, deciding which STM summaries are compressed and promoted into LTM.

  • 03

    Instruction-Tuned Phi-4-mini for Structured Logs

    DM-RAG instruction-tunes Phi-4-mini with a strict JSON prompt design, combining STM and LTM context to produce interpretable summaries and attack labels for structured security logs.

RESULTS

By the Numbers

Accuracy

53.64%

-3.60 over Phi-4 + RAG (MITRE)

Precision

53.74%

+8.82 over LoRA Fine-tuned

Recall

98.70%

+57.13 over Phi-4 + RAG (MITRE)

F1 Score

69.59%

+17.89 over Phi-4 + RAG (MITRE)

On the UNSW-NB15 intrusion detection benchmark, DM-RAG is evaluated on accuracy, precision, recall, and F1. The 98.70% recall and 69.59% F1 show that DM-RAG captures almost all attacks while maintaining balanced precision compared to Phi-4-mini baselines.

BENCHMARK

By the Numbers

On the UNSW-NB15 intrusion detection benchmark, DM-RAG is evaluated on accuracy, precision, recall, and F1. The 98.70% recall and 69.59% F1 show that DM-RAG captures almost all attacks while maintaining balanced precision compared to Phi-4-mini baselines.

BENCHMARK

Performance Comparison on UNSW-NB15 Test Set

F1 Score on UNSW-NB15 for DM-RAG and Phi-4-mini baselines.

KEY INSIGHT

The Counterintuitive Finding

DM-RAG achieves 98.70% recall with only 53.74% precision, while zero-shot Phi-4-mini reaches 98.91% precision but just 0.20% recall.

This is surprising because high precision is usually desirable, yet DM-RAG shows that for security logs, aggressively maximizing recall is more valuable than ultra-conservative precision.

WHY IT MATTERS

What this unlocks for the field

DM-RAG unlocks high-recall, interpretable, long-horizon reasoning over structured security logs using a compact Phi-4-mini backbone and dual-memory RAG.

Builders can now deploy lightweight, memory-augmented LLM detectors that track multistage attacks over time without massive external corpora or heavy fine-tuning.

~11 min read← Back to papers

Related papers

RAG

A Dynamic Retrieval-Augmented Generation System with Selective Memory and Remembrance

Okan Bursa

· 2026

Adaptive RAG Memory (ARM) augments a standard retriever–generator stack with a Dynamic Embedding Layer and Remembrance Engine that track usage statistics and apply selective remembrance and decay to embeddings. On a lightweight retrieval benchmark, ARM achieves NDCG@5 ≈ 0.9401 and Recall@5 = 1.000 with 22M parameters, matching larger baselines like gte-small while providing the best efficiency among ultra-efficient models.

RAGLong-Term Memory

HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues

Yijie Zhong, Yunfan Gao, Haofen Wang

· 2026

HingeMem combines Boundary Guided Long-Term Memory, Dialogue Boundary Extraction, Memory Construction, Query Adaptive Retrieval, Hyperedge Rerank, and Adaptive Stop to segment dialogues into element-indexed hyperedges and plan query-specific retrieval. On LOCOMO, HingeMem achieves 63.9 overall F1 and 75.1 LLM-as-a-Judge score, surpassing the best baseline Zep (56.9 F1) by 7.0 F1 without using category-specific QA formats.

Questions about this paper?

Paper: Memory-Augmented Log Analysis with Phi-4-mini: Enhancing Threat Detection in Structured Security Logs

Answers use this explainer on Memory Papers.

Checking…