Papers
arxiv:1809.09528

ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters

Published on Sep 25, 2018
Authors:
,
,

Abstract

To bridge the gap between the capabilities of the state-of-the-art in factoid question answering (QA) and what users ask, we need large datasets of real user questions that capture the various question phenomena users are interested in, and the diverse ways in which these questions are formulated. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as compositionality, temporal reasoning, and comparisons. ComQA questions come from the WikiAnswers community QA platform, which typically contains questions that are not satisfactorily answerable by existing search engine technology. Through a large crowdsourcing effort, we clean the question dataset, group questions into paraphrase clusters, and annotate clusters with their answers. ComQA contains 11,214 questions grouped into 4,834 paraphrase clusters. We detail the process of constructing ComQA, including the measures taken to ensure its high quality while making effective use of crowdsourcing. We also present an extensive analysis of the dataset and the results achieved by state-of-the-art systems on ComQA, demonstrating that our dataset can be a driver of future research on QA.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/1809.09528 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1809.09528 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1809.09528 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.