jiazhengli
/

Qwen2.5-7B-RoleMRC-sft

Model card Files Files and versions Community

Qwen2.5-7B-RoleMRC-sft / README.md

jiazhengli's picture

Update README.md

ec88d5b verified 1 day ago

|

history blame contribute delete

3.12 kB

	---
	model-index:
	- name: jiazhengli/Qwen2.5-7B-RoleMRC-sft
	results: []
	datasets:
	- Junrulu/RoleMRC
	language:
	- en
	base_model: Qwen/Qwen2.5-7B
	license: llama3
	---

	# Model Card for Qwen2.5-7B-RoleMRC-sft

	This repository provides a fine-tuned version of Qwen2.5-7B, using our proposed [RoleMRC dataset](https://huggingface.co/datasets/Junrulu/RoleMRC). We obey all licenses mentioned in Qwen 2's work.

	## Performance

	Reference-based Evaluation Result

	\| Model \| BLEU \| ROUGE-1 \| ROUGE-2 \| ROUGE-L \| ROUGE-Lsum \| METEOR \| BERTScore F1 \|
	\|--------------------------------\|--------\|---------\|---------\|---------\|------------\|--------\|-----------\|
	\| Qwen2.5-7B-Instruct \| 0.0224 \| 0.2283 \| 0.0621 \| 0.1518 \| 0.1599 \| 0.2490 \| 0.8471 \| \|
	\| Qwen2.5-72B-Instruct \| 0.0245 \| 0.2350 \| 0.0656 \| 0.1554 \| 0.1660 \| 0.2579 \| 0.8485 \| \|
	\| Qwen2.5-7B-RoleMRC-SFT \| 0.1963 \| 0.4764 \| 0.2744 \| 0.3959 \| 0.3968 \| 0.4337 \| 0.9063 \| \|
	\| Qwen2.5-7B-RoleMRC-DPO \| 0.1244 \| 0.4178 \| 0.1916 \| 0.3164 \| 0.3177 \| 0.4205 \| 0.8931 \| \|

	General Benchmark

	\| Model \| GSM8K 8-shot \| Math 4-shot \| GPQA 0-shot \| IFEval 3-shot \| MMLU-Pro 5-shot \| MMLU 0-shot \| PiQA 3-shot \| MUSR 0-shot \| TruthfulQA 3-shot\| / Avg. \|
	\|----------------------------------------\|-------------\|------------\|-------------\|--------------\|---------------\|-----------\|-----------\|-----------\|------------------------\|------\|
	\| QWEN2.5-7B \| 78.7 \| 36.78 \| 16.74 \| 38.25 \| 44.87 \| 71.75 \| 81.23 \| 44.31 \| 38.8 \| 50.16 \|
	\| QWEN2.5-7B-INSTRUCT \| 81.2 \| 40.28 \| 13.39 \| 65.71 \| 40.85 \| 71.76 \| 80.25 \| 42.86 \| 47.86 \| 53.8 \|
	\| QWEN2.5-7B-ROLEMRC-SFT \| 78.54 \| 32.7 \| 16.52 \| 42.81 \| 43.43 \| 71.19 \| 80.63 \| 45.11 \| 37.58 \| 49.83 \|
	\| QWEN2.5-7B-ROLEMRC-DPO \| 79.38 \| 32.72 \| 18.97 \| 47.96 \| 43.39 \| 71.21 \| 80.36 \| 45.37 \| 39.41 \| 50.97 \|

	## Evaluation Details
	Five conditional benchmarks, using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness):
	- GSM8K: 8-shot, report strict match
	- IFEval: 3-shot, report instruction-level strict accuracy
	- PiQA: 3-shot, report accuracy
	- MMLU: 0-shot, report normalized accuracy
	- TruthfulQA: 3-shot, report accuracy of single-true mc1 setting

	## Input Format

	The model is trained to use the following format:
	```
	<\|start_header_id\|>user<\|end_header_id\|>

	{PROMPT}<\|eot_id\|>
	<\|start_header_id\|>assistant<\|end_header_id\|>

	{Response}
	```

	## Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-5
	- total_train_batch_size: 16
	- optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.04
	- num_epochs: 1.0