Papers
arxiv:2503.04973

Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning

Published on Mar 6
ยท Submitted by giulio98 on Mar 11

Abstract

Incorporating external knowledge in large language models (LLMs) enhances their utility across diverse applications, but existing methods have trade-offs. Retrieval-Augmented Generation (RAG) fetches evidence via similarity search, but key information may fall outside top ranked results. Long-context models can process multiple documents but are computationally expensive and limited by context window size. Inspired by students condensing study material for open-book exams, we propose task-aware key-value (KV) cache compression, which compresses external knowledge in a zero- or few-shot setup. This enables LLMs to reason efficiently over a compacted representation of all relevant information. Experiments show our approach outperforms both RAG and task-agnostic compression methods. On LongBench v2, it improves accuracy by up to 7 absolute points over RAG with a 30x compression rate, while reducing inference latency from 0.43s to 0.16s. A synthetic dataset highlights that RAG performs well when sparse evidence suffices, whereas task-aware compression is superior for broad knowledge tasks.

Community

Paper author Paper submitter

Nice work

I tried the demo, on the "Large Language Diffusion Models" arxiv paper (2502.09992v2) using 30x compression and it failed (starting hallucinating about llama), set it 4x compression and it worked perfectly.

But in the paper you claim 30x compression can match/surpass RAG performance. Are there other considerations at play to make this play nicely?

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.04973 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.04973 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 1