A newer version of the Gradio SDK is available:
5.44.1
UIT-ViCoV19QA: A Dataset for COVID-19 Community-based Question Answering on Vietnamese Language
Authors: Triet Minh Thai, Ngan Ha-Thao Chu, Anh Tuan Vo, and Son T. Luu.
Description: The UIT-ViCoV19QA dataset comprises 4,500 Vietnamese question-answer pairs about COVID-19 pandemic collected from trusted medical FAQ sources, each question has at least one answer and at most four unique paraphrased answers.
The statistics of the dataset are shown in the table below.
No. | Stats. | Train | Dev. | Test | All |
---|---|---|---|---|---|
Answer 1 | Number of question-answer pairs | 3500 | 500 | 500 | 4500 |
Average question length | 31.44 | 33.66 | 32.32 | 31.79 | |
Average answer length | 120.53 | 116.04 | 118.11 | 119.76 | |
Question vocabulary size | 4396 | 1869 | 1770 | 4924 | |
Answer vocabulary size | 8537 | 3689 | 3367 | 9411 | |
Answer 2 | Number of question-answer pairs | 1390 | 209 | 201 | 1800 |
Average question length | 35.56 | 39.22 | 39.72 | 36.45 | |
Average answer length | 40.54 | 39.25 | 42.73 | 40.64 | |
Question vocabulary size | 2883 | 1269 | 1207 | 3305 | |
Answer vocabulary size | 2632 | 1098 | 1129 | 2949 | |
Answer 3 | Number of question-answer pairs | 542 | 79 | 79 | 700 |
Average question length | 34.77 | 36.7 | 39.28 | 35.49 | |
Average answer length | 28.68 | 26.43 | 30.89 | 28.67 | |
Question vocabulary size | 1836 | 717 | 693 | 2111 | |
Answer vocabulary size | 1554 | 503 | 585 | 1753 | |
Answer 4 | Number of question-answer pairs | 272 | 39 | 39 | 350 |
Average question length | 36.57 | 37.59 | 42.15 | 37.1 | |
Average answer length | 29.75 | 29.03 | 35.72 | 30.25 | |
Question vocabulary size | 1315 | 470 | 460 | 1519 | |
Answer vocabulary size | 924 | 353 | 374 | 1075 |
Link to publication: https://aclanthology.org/2022.paclic-1.88/.
The dataset is used only for research purposes.
Some parts of the source code were inherited from the publication at https://github.com/barshana-banerjee/ParaQA_Experiments.git.
Contact information
Mr. Triet Minh Thai: [email protected]
Mr. Son T. Luu: [email protected]
Citation
@inproceedings{thai-etal-2022-uit,
title = "{UIT}-{V}i{C}o{V}19{QA}: A Dataset for {COVID}-19 Community-based Question Answering on {V}ietnamese Language",
author = "Thai, Triet and Thao-Ha, Ngan Chu and Vo, Anh and Luu, Son",
booktitle = "Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation",
month = oct,
year = "2022",
address = "Manila, Philippines",
publisher = "De La Salle University",
url = "https://aclanthology.org/2022.paclic-1.88",
pages = "801--810",
}