File size: 2,082 Bytes
78a77be
 
e01a1c5
 
78a77be
 
 
e01a1c5
 
 
 
 
78a77be
 
 
e01a1c5
78a77be
e01a1c5
78a77be
e01a1c5
 
0037ddb
78a77be
 
 
0037ddb
af4d50e
0037ddb
78a77be
0037ddb
 
e01a1c5
0037ddb
 
 
78a77be
 
 
e01a1c5
 
 
 
 
78a77be
 
 
 
 
d1bcc4d
 
 
 
 
 
 
 
78a77be
6ce14d5
0037ddb
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# UTAustin-AIHealth

Welcome to **UTAustin-AIHealth** – a hub dedicated to advancing research in medical AI. 
This repo contains the **MedHallu** dataset, which underpins our recent work:

**MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models**

MedHallu is a rigorously designed benchmark intended to evaluate large language models' ability to detect hallucinations in medical question-answering tasks. 
The dataset is organized into two distinct splits:

- **pqa_labeled:** Contains 1,000 high-quality, human-annotated samples derived from PubMedQA.
- **pqa_artificial:** Contains 9,000 samples generated via an automated pipeline from PubMedQA.

---

## Setup Environment

To work with the MedHallu dataset, please install the Hugging Face `datasets` library using pip:

```bash
pip install datasets
```

## How to Use MedHallu

**Downloading the Dataset:**  
```python
from datasets import load_dataset

# Load the 'pqa_labeled' split: 1,000 high-quality, human-annotated samples.
medhallu_labeled = load_dataset("UTAustin-AIHealth/MedHallu", "pqa_labeled")

# Load the 'pqa_artificial' split: 9,000 samples generated via an automated pipeline.
medhallu_artificial = load_dataset("UTAustin-AIHealth/MedHallu", "pqa_artificial")
```

---


## License

This dataset and associated resources are distributed under the [MIT License](https://opensource.org/license/mit/).

## Citations

If you find MedHallu useful in your research, please consider citing our work:

```bibtex
@misc{pandit2025medhallucomprehensivebenchmarkdetecting,
      title={MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models}, 
      author={Shrey Pandit and Jiawei Xu and Junyuan Hong and Zhangyang Wang and Tianlong Chen and Kaidi Xu and Ying Ding},
      year={2025},
      eprint={2502.14302},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.14302}, 
}
```

## Contact
For further information or inquiries about MedHallu, please reach out at [email protected]