File size: 2,096 Bytes
8c1e508
 
99f4310
 
 
8c1e508
4c0c388
 
ad224e3
4c0c388
99f4310
4c0c388
cf87db6
99f4310
 
4c0c388
6e9d6a0
99f4310
4c0c388
99f4310
4c0c388
99f4310
4c0c388
99f4310
4c0c388
99f4310
4c0c388
99f4310
 
 
 
7e380e1
99f4310
 
 
 
f22584b
99f4310
 
7433617
 
 
 
 
 
99f4310
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
license: llama2
language:
- en
pipeline_tag: text-classification
---


# **ReasonEval-34B Model Card**

## Model Description

`ReasonEval-34B` is a 34B parameter decoder-only language model fine-tuned from [`llemma_34b`](https://huggingface.co/EleutherAI/llemma_34b). Given a mathematical problem and the solution, `ReasonEval-34B` assesses the problem-solving process in a step-by-step format from the following perspectives:
- **Validity**: The step contains no mistakes in calculation and logic.
- **Redundancy**: The step lacks utility in solving the problem but is still valid.


With ReasonEval, you can

- 📏 quantify the quality of reasoning steps free of human or close-source models.

- 🤖 find the potential invalid or redundant steps in the solutions even with the correct results.

- 🛠️ select high-quality training data for downstream tasks (e.g., fine-tuning).    

## Model Details

* **Model type**: `ReasonEval-34B`'s architecture is identical to [`llemma_34b`](https://huggingface.co/EleutherAI/llemma_34b), except that the
classification head for next-token prediction is replaced with a classification head for outputting the
possibilities of each class of reasong steps.
* **Language(s)**: English
* **Paper**: [Evaluating Mathematical Reasoning Beyond Accuracy](https://arxiv.org/pdf/2404.05692.pdf)
* **Github**: [https://github.com/GAIR-NLP/ReasonEval](https://github.com/GAIR-NLP/ReasonEval)
* **Finetuned from model**: [https://huggingface.co/EleutherAI/llemma_34b](https://huggingface.co/EleutherAI/llemma_34b)
* **Fine-tuning Data**: [PRM800K](https://github.com/openai/prm800k)

For detailed instructions on how to use the ReasonEval-34B model, visit our GitHub repository at [https://github.com/GAIR-NLP/ReasonEval](https://github.com/GAIR-NLP/ReasonEval).
## How to Cite
```bibtex
@article{xia2024evaluating,
        title={Evaluating Mathematical Reasoning Beyond Accuracy}, 
        author={Xia, Shijie and Li, Xuefeng and Liu, Yixin and Wu, Tongshuang and Liu, Pengfei},
        journal={arXiv preprint arXiv:2404.05692},
        year={2024},
}
```