arxiv:2410.03227

ALR^2: A Retrieve-then-Reason Framework for Long-context Question Answering

Published on Oct 4, 2024

Authors:

Abstract

The <PRE_TAG>context window</POST_TAG> of large language models (LLMs) has been extended significantly in recent years. However, while the <PRE_TAG>context length</POST_TAG> that the LLM can process has grown, the capability of the model to accurately reason over that context degrades noticeably. This occurs because modern LLMs often become overwhelmed by the vast amount of information in the context; when answering questions, the model must identify and reason over relevant evidence sparsely distributed throughout the text. To alleviate the challenge of long-context <PRE_TAG>reasoning</POST_TAG>, we develop a retrieve-then-reason framework, enabling LLMs to reason over relevant evidence collected during an intermediate retrieval step. We find that modern LLMs struggle to accurately retrieve relevant facts and instead, often hallucinate "retrieved facts", resulting in flawed <PRE_TAG><PRE_TAG>reasoning</POST_TAG></POST_TAG> and the production of incorrect answers. To address these issues, we introduce ALR^2, a method that augments the long-<PRE_TAG>context <PRE_TAG>reasoning</POST_TAG></POST_TAG> capability of LLMs via an explicit two-stage procedure, i.e., aligning LLMs with the objectives of both retrieval and <PRE_TAG>reasoning</POST_TAG>. We demonstrate the efficacy of ALR^2 for mitigating performance degradation in long-<PRE_TAG>context <PRE_TAG>reasoning</POST_TAG></POST_TAG> tasks. Through extensive experiments on long-<PRE_TAG>context QA benchmarks</POST_TAG>, we find our method to outperform competitive baselines by large margins, achieving at least 8.4 and 7.9 EM gains on the long-context versions of HotpotQA and SQuAD datasets, respectively.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2410.03227 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2410.03227 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2410.03227 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.