Model Card for Qwen2.5-7B-RoleMRC-dpo

This repository provides a fine-tuned version of Qwen2.5-7B, using our proposed RoleMRC dataset. We obey all licenses mentioned in Qwen2's work.

Performance

Reference-based Evaluation Result

Model BLEU ROUGE-1 ROUGE-2 ROUGE-L ROUGE-Lsum METEOR BERTScore F1
Qwen2.5-7B-Instruct 0.0224 0.2283 0.0621 0.1518 0.1599 0.2490 0.8471
Qwen2.5-72B-Instruct 0.0245 0.2350 0.0656 0.1554 0.1660 0.2579 0.8485
Qwen2.5-7B-RoleMRC-SFT 0.1963 0.4764 0.2744 0.3959 0.3968 0.4337 0.9063
Qwen2.5-7B-RoleMRC-DPO 0.1244 0.4178 0.1916 0.3164 0.3177 0.4205 0.8931

General Benchmark

Model GSM8K 8-shot Math 4-shot GPQA 0-shot IFEval 3-shot MMLU-Pro 5-shot MMLU 0-shot PiQA 3-shot MUSR 0-shot TruthfulQA 3-shot / Avg.
QWEN2.5-7B 78.7 36.78 16.74 38.25 44.87 71.75 81.23 44.31 38.8 50.16
QWEN2.5-7B-INSTRUCT 81.2 40.28 13.39 65.71 40.85 71.76 80.25 42.86 47.86 53.8
QWEN2.5-7B-RoleMRC-SFT 78.54 32.7 16.52 42.81 43.43 71.19 80.63 45.11 37.58 49.83
Qwen2.5-7B-RoleMRC-DPO 79.38 32.72 18.97 47.96 43.39 71.21 80.36 45.37 39.41 50.97

Evaluation Details

Five conditional benchmarks, using lm-evaluation-harness:

  • GSM8K: 8-shot, report strict match
  • IFEval: 3-shot, report instruction-level strict accuracy
  • PiQA: 3-shot, report accuracy
  • MMLU: 0-shot, report normalized accuracy
  • TruthfulQA: 3-shot, report accuracy of single-true mc1 setting

Input Format

The model is trained to use the following format:

<|start_header_id|>user<|end_header_id|>

{PROMPT}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

{Response}

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-5
  • total_train_batch_size: 16
  • optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8
  • lr_scheduler_type: cosine
  • DPO beta: 0.1
  • lr_scheduler_warmup_ratio: 0.04
  • num_epochs: 1.0
Downloads last month
5
Safetensors
Model size
7.62B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for jiazhengli/Qwen2.5-7B-RoleMRC-dpo

Base model

Qwen/Qwen2.5-7B
Finetuned
(1)
this model

Dataset used to train jiazhengli/Qwen2.5-7B-RoleMRC-dpo