Knowledge Packs: Zero-Token Knowledge Delivery via KV Cache Injection
Andrey Pustovit
· 2026
Knowledge Packs pre-compute KV Cache Injection, KV–Prefix Equivalence, Banked Routing, and KV Composition to deliver retrieved knowledge and steering via KV states instead of prompt tokens. On HotpotQA, Knowledge Packs’ KV-chat matches RAG at 65.2% EM on Qwen3-8B with 0/500 divergences while eliminating 284 tokens of retrieval text per query.