This is a reward model finetuned on Llemma-34b. To score the steps, pass encoded text = question + solution as input.

rewards = model(text).mean(dim=-1).sigmoid()[index]

Where index is the positions for special end tokens of each step.

Downloads last month: 120

Safetensors

Model size

33.7B params

Tensor type

BF16

Inference Providers NEW

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Collection including tkitsers/Llemma-reward-model

Inference Scaling Laws Llemma Models

Collection

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models • 3 items • Updated Oct 22, 2024