mistral-nemo-gutenberg2-12B-test

mistralai/Mistral-Nemo-Instruct-2407 finetuned on nbeerbower/gutenberg2-dpo.

This model is a test for the sake of benchmarking my gutenberg2 dataset.

Method

Finetuned using an RTX 3090 for 3 epochs.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	20.73
IFEval (0-Shot)	33.85
BBH (3-Shot)	32.04
MATH Lvl 5 (4-Shot)	10.20
GPQA (0-shot)	8.95
MuSR (0-shot)	10.97
MMLU-PRO (5-shot)	28.39

Downloads last month: 5

Safetensors

Model size

12.2B params

Tensor type

BF16

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for nbeerbower/mistral-nemo-gutenberg2-12B-test

Base model

mistralai/Mistral-Nemo-Base-2407

Finetuned

mistralai/Mistral-Nemo-Instruct-2407

Finetuned

(46)

this model

Dataset used to train nbeerbower/mistral-nemo-gutenberg2-12B-test

Evaluation results

strict accuracy on IFEval (0-Shot)
Open LLM Leaderboard

33.850
normalized accuracy on BBH (3-Shot)
Open LLM Leaderboard

32.040
exact match on MATH Lvl 5 (4-Shot)
Open LLM Leaderboard

10.200
acc_norm on GPQA (0-shot)
Open LLM Leaderboard

8.950
acc_norm on MuSR (0-shot)
Open LLM Leaderboard

10.970
accuracy on MMLU-PRO (5-shot)
test set Open LLM Leaderboard

28.390

View on Papers With Code