Spaces:
Configuration error
Configuration error
File size: 2,082 Bytes
78a77be e01a1c5 78a77be e01a1c5 78a77be e01a1c5 78a77be e01a1c5 78a77be e01a1c5 0037ddb 78a77be 0037ddb af4d50e 0037ddb 78a77be 0037ddb e01a1c5 0037ddb 78a77be e01a1c5 78a77be d1bcc4d 78a77be 6ce14d5 0037ddb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
# UTAustin-AIHealth
Welcome to **UTAustin-AIHealth** – a hub dedicated to advancing research in medical AI.
This repo contains the **MedHallu** dataset, which underpins our recent work:
**MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models**
MedHallu is a rigorously designed benchmark intended to evaluate large language models' ability to detect hallucinations in medical question-answering tasks.
The dataset is organized into two distinct splits:
- **pqa_labeled:** Contains 1,000 high-quality, human-annotated samples derived from PubMedQA.
- **pqa_artificial:** Contains 9,000 samples generated via an automated pipeline from PubMedQA.
---
## Setup Environment
To work with the MedHallu dataset, please install the Hugging Face `datasets` library using pip:
```bash
pip install datasets
```
## How to Use MedHallu
**Downloading the Dataset:**
```python
from datasets import load_dataset
# Load the 'pqa_labeled' split: 1,000 high-quality, human-annotated samples.
medhallu_labeled = load_dataset("UTAustin-AIHealth/MedHallu", "pqa_labeled")
# Load the 'pqa_artificial' split: 9,000 samples generated via an automated pipeline.
medhallu_artificial = load_dataset("UTAustin-AIHealth/MedHallu", "pqa_artificial")
```
---
## License
This dataset and associated resources are distributed under the [MIT License](https://opensource.org/license/mit/).
## Citations
If you find MedHallu useful in your research, please consider citing our work:
```bibtex
@misc{pandit2025medhallucomprehensivebenchmarkdetecting,
title={MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models},
author={Shrey Pandit and Jiawei Xu and Junyuan Hong and Zhangyang Wang and Tianlong Chen and Kaidi Xu and Ying Ding},
year={2025},
eprint={2502.14302},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.14302},
}
```
## Contact
For further information or inquiries about MedHallu, please reach out at [email protected] |