danhtran2mind's picture
Upload 42 files
6753a7e verified

A newer version of the Gradio SDK is available: 5.44.1

Upgrade

UIT-ViCoV19QA: A Dataset for COVID-19 Community-based Question Answering on Vietnamese Language

Authors: Triet Minh Thai, Ngan Ha-Thao Chu, Anh Tuan Vo, and Son T. Luu.

Description: The UIT-ViCoV19QA dataset comprises 4,500 Vietnamese question-answer pairs about COVID-19 pandemic collected from trusted medical FAQ sources, each question has at least one answer and at most four unique paraphrased answers.

The statistics of the dataset are shown in the table below.

No. Stats. Train Dev. Test All
Answer 1 Number of question-answer pairs 3500 500 500 4500
Average question length 31.44 33.66 32.32 31.79
Average answer length 120.53 116.04 118.11 119.76
Question vocabulary size 4396 1869 1770 4924
Answer vocabulary size 8537 3689 3367 9411
Answer 2 Number of question-answer pairs 1390 209 201 1800
Average question length 35.56 39.22 39.72 36.45
Average answer length 40.54 39.25 42.73 40.64
Question vocabulary size 2883 1269 1207 3305
Answer vocabulary size 2632 1098 1129 2949
Answer 3 Number of question-answer pairs 542 79 79 700
Average question length 34.77 36.7 39.28 35.49
Average answer length 28.68 26.43 30.89 28.67
Question vocabulary size 1836 717 693 2111
Answer vocabulary size 1554 503 585 1753
Answer 4 Number of question-answer pairs 272 39 39 350
Average question length 36.57 37.59 42.15 37.1
Average answer length 29.75 29.03 35.72 30.25
Question vocabulary size 1315 470 460 1519
Answer vocabulary size 924 353 374 1075

Link to publication: https://aclanthology.org/2022.paclic-1.88/.

The dataset is used only for research purposes.

Some parts of the source code were inherited from the publication at https://github.com/barshana-banerjee/ParaQA_Experiments.git.

Contact information

Mr. Triet Minh Thai: [email protected]
Mr. Son T. Luu: [email protected]

Citation

@inproceedings{thai-etal-2022-uit,
    title = "{UIT}-{V}i{C}o{V}19{QA}: A Dataset for {COVID}-19 Community-based Question Answering on {V}ietnamese Language",
    author = "Thai, Triet  and Thao-Ha, Ngan Chu  and Vo, Anh  and Luu, Son",
    booktitle = "Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation",
    month = oct,
    year = "2022",
    address = "Manila, Philippines",
    publisher = "De La Salle University",
    url = "https://aclanthology.org/2022.paclic-1.88",
    pages = "801--810",
}