|
--- |
|
license: llama2 |
|
language: |
|
- en |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
|
|
# **ReasonEval-34B Model Card** |
|
|
|
## Model Description |
|
|
|
`ReasonEval-34B` is a 34B parameter decoder-only language model fine-tuned from [`llemma_34b`](https://huggingface.co/EleutherAI/llemma_34b). Given a mathematical problem and the solution, `ReasonEval-34B` assesses the problem-solving process in a step-by-step format from the following perspectives: |
|
- **Validity**: The step contains no mistakes in calculation and logic. |
|
- **Redundancy**: The step lacks utility in solving the problem but is still valid. |
|
|
|
|
|
With ReasonEval, you can |
|
|
|
- π quantify the quality of reasoning steps free of human or close-source models. |
|
|
|
- π€ find the potential invalid or redundant steps in the solutions even with the correct results. |
|
|
|
- π οΈ select high-quality training data for downstream tasks (e.g., fine-tuning). |
|
|
|
## Model Details |
|
|
|
* **Model type**: `ReasonEval-34B`'s architecture is identical to [`llemma_34b`](https://huggingface.co/EleutherAI/llemma_34b), except that the |
|
classification head for next-token prediction is replaced with a classification head for outputting the |
|
possibilities of each class of reasong steps. |
|
* **Language(s)**: English |
|
* **Paper**: [Evaluating Mathematical Reasoning Beyond Accuracy](https://arxiv.org/pdf/2404.05692.pdf) |
|
* **Github**: [https://github.com/GAIR-NLP/ReasonEval](https://github.com/GAIR-NLP/ReasonEval) |
|
* **Finetuned from model**: [https://huggingface.co/EleutherAI/llemma_34b](https://huggingface.co/EleutherAI/llemma_34b) |
|
* **Fine-tuning Data**: [PRM800K](https://github.com/openai/prm800k) |
|
|
|
For detailed instructions on how to use the ReasonEval-34B model, visit our GitHub repository at [https://github.com/GAIR-NLP/ReasonEval](https://github.com/GAIR-NLP/ReasonEval). |
|
## How to Cite |
|
```bibtex |
|
@article{xia2024evaluating, |
|
title={Evaluating Mathematical Reasoning Beyond Accuracy}, |
|
author={Xia, Shijie and Li, Xuefeng and Liu, Yixin and Wu, Tongshuang and Liu, Pengfei}, |
|
journal={arXiv preprint arXiv:2404.05692}, |
|
year={2024}, |
|
} |
|
``` |