jiazhengli
/

Qwen2.5-7B-RoleMRC-sft

Model card Files Files and versions Community

jiazhengli commited on 1 day ago

Commit

1f00467

·

verified ·

1 Parent(s): 031e1b7

Create README.md

Files changed (1) hide show

README.md +65 -0

README.md ADDED Viewed

	@@ -0,0 +1,65 @@

+---
+model-index:
+- name: jiazhengli/Qwen2.5-7B-RoleMRC-sft
+  results: []
+datasets:
+- Junrulu/RoleMRC
+language:
+- en
+base_model: Qwen/Qwen2.5-7B
+license: llama3
+---
+# Model Card for Llama-3.1-8B-RoleMRC-sft
+This repository provides a fine-tuned version of Llama-3.1-8B, using our proposed [RoleMRC dataset](https://huggingface.co/datasets/Junrulu/RoleMRC). We obey all licenses mentioned in llama3's work.
+## Performance
+Reference-based Evaluation Result
+| Model                         | BLEU   | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-Lsum | METEOR | BERTScore F1    |
+|--------------------------------|--------|---------|---------|---------|------------|--------|-----------|
+| Qwen2.5-7B-Instruct           | 0.0224 | 0.2283  | 0.0621  | 0.1518  | 0.1599     | 0.2490 | 0.8471    |       |
+| Qwen2.5-72B-Instruct          | 0.0245 | 0.2350  | 0.0656  | 0.1554  | 0.1660     | 0.2579 | 0.8485    |       |
+| **Qwen2.5-7B-RoleMRC-SFT**        | 0.1963 | 0.4764  | 0.2744  | 0.3959  | 0.3968     | 0.4337 | 0.9063    |       |
+| Qwen2.5-7B-RoleMRC-DPO        | 0.1244 | 0.4178  | 0.1916  | 0.3164  | 0.3177     | 0.4205 | 0.8931    |       |
+General Benchmark
+| Model                                  | GSM8K 8-shot | Math 4-shot | GPQA 0-shot | IFEval 3-shot | MMLU-Pro 5-shot | MMLU 0-shot | PiQA 3-shot | MUSR 0-shot | TruthfulQA 3-shot| / Avg. |
+|----------------------------------------|-------------|------------|-------------|--------------|---------------|-----------|-----------|-----------|------------------------|------|
+| QWEN2.5-7B                             | 78.7        | 36.78      | 16.74       | 38.25        | 44.87         | 71.75     | 81.23     | 44.31     | 38.8                   | 50.16 |
+| QWEN2.5-7B-INSTRUCT                    | 81.2        | 40.28      | 13.39       | 65.71        | 40.85         | 71.76     | 80.25     | 42.86     | 47.86                  | 53.8 |
+| **QWEN2.5-7B-ROLEMRC-SFT**                 | 78.54       | 32.7       | 16.52       | 42.81        | 43.43         | 71.19     | 80.63     | 45.11     | 37.58                  | 49.83 |
+| QWEN2.5-7B-ROLEMRC-DPO                 | 79.38       | 32.72      | 18.97       | 47.96        | 43.39         | 71.21     | 80.36     | 45.37     | 39.41                  | 50.97 |
+## Evaluation Details
+Five conditional benchmarks, using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness):
+- GSM8K: 8-shot, report strict match
+- IFEval: 3-shot, report instruction-level strict accuracy
+- PiQA: 3-shot, report accuracy
+- MMLU: 0-shot, report normalized accuracy
+- TruthfulQA: 3-shot, report accuracy of single-true mc1 setting
+## Input Format
+The model is trained to use the following format:
+```
+<|start_header_id|>user<|end_header_id|>
+{PROMPT}<|eot_id|>
+<|start_header_id|>assistant<|end_header_id|>
+{Response}
+```
+## Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1e-5
+- total_train_batch_size: 16
+- optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.04
+- num_epochs: 1.0