Transformers
Safetensors
English
mergekit
Merge
Inference Endpoints
zli12321 commited on
Commit
c9fe911
·
verified ·
1 Parent(s): 59d4d4c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -10
README.md CHANGED
@@ -10,6 +10,7 @@ license: apache-2.0
10
  datasets:
11
  - prometheus-eval/Preference-Collection
12
  - prometheus-eval/Feedback-Collection
 
13
  language:
14
  - en
15
  ---
@@ -101,16 +102,6 @@ An instruction (might include an Input inside it), a response to evaluate, and a
101
  # Citations
102
 
103
 
104
- ```bibtex
105
- @misc{kim2023prometheus,
106
- title={Prometheus: Inducing Fine-grained Evaluation Capability in Language Models},
107
- author={Seungone Kim and Jamin Shin and Yejin Cho and Joel Jang and Shayne Longpre and Hwaran Lee and Sangdoo Yun and Seongjin Shin and Sungdong Kim and James Thorne and Minjoon Seo},
108
- year={2023},
109
- eprint={2310.08491},
110
- archivePrefix={arXiv},
111
- primaryClass={cs.CL}
112
- }
113
- ```
114
  ```bibtex
115
  @misc{kim2024prometheus,
116
  title={Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models},
@@ -120,4 +111,26 @@ An instruction (might include an Input inside it), a response to evaluate, and a
120
  archivePrefix={arXiv},
121
  primaryClass={cs.CL}
122
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123
  ```
 
10
  datasets:
11
  - prometheus-eval/Preference-Collection
12
  - prometheus-eval/Feedback-Collection
13
+ - zli12321/pedants_qa_evaluation_bench
14
  language:
15
  - en
16
  ---
 
102
  # Citations
103
 
104
 
 
 
 
 
 
 
 
 
 
 
105
  ```bibtex
106
  @misc{kim2024prometheus,
107
  title={Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models},
 
111
  archivePrefix={arXiv},
112
  primaryClass={cs.CL}
113
  }
114
+ ```
115
+ ```bibtex
116
+ @inproceedings{li-etal-2024-pedants,
117
+ title = "{PEDANTS}: Cheap but Effective and Interpretable Answer Equivalence",
118
+ author = "Li, Zongxia and
119
+ Mondal, Ishani and
120
+ Nghiem, Huy and
121
+ Liang, Yijun and
122
+ Boyd-Graber, Jordan Lee",
123
+ editor = "Al-Onaizan, Yaser and
124
+ Bansal, Mohit and
125
+ Chen, Yun-Nung",
126
+ booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
127
+ month = nov,
128
+ year = "2024",
129
+ address = "Miami, Florida, USA",
130
+ publisher = "Association for Computational Linguistics",
131
+ url = "https://aclanthology.org/2024.findings-emnlp.548/",
132
+ doi = "10.18653/v1/2024.findings-emnlp.548",
133
+ pages = "9373--9398",
134
+ abstract = "Question answering (QA) can only make progress if we know if an answer is correct, but current answer correctness (AC) metrics struggle with verbose, free-form answers from large language models (LLMs). There are two challenges with current short-form QA evaluations: a lack of diverse styles of evaluation data and an over-reliance on expensive and slow LLMs. LLM-based scorers correlate better with humans, but this expensive task has only been tested on limited QA datasets. We rectify these issues by providing rubrics and datasets for evaluating machine QA adopted from the Trivia community. We also propose an efficient, and interpretable QA evaluation that is more stable than an exact match and neural methods (BERTScore)."
135
+ }
136
  ```