Model Card for Llama-3.1-8B-RoleMRC-sft

This repository provides a fine-tuned version of Llama-3.1-8B, using our proposed RoleMRC dataset. We obey all licenses mentioned in llama3's work.

Performance

Reference-based Evaluation Result

Model BLEU ROUGE-1 ROUGE-2 ROUGE-L ROUGE-Lsum METEOR BERTScore F1
LLaMA3.1-8B-Instruct 0.0226 0.2277 0.0615 0.1509 0.1650 0.2594 0.8478
LLaMA3.1-70B-Instruct 0.0232 0.2258 0.0646 0.1500 0.1661 0.2632 0.8480
LLaMA3.1-8B-RoleMRC-SFT 0.1782 0.4628 0.2676 0.3843 0.3853 0.3975 0.8831
LLaMA3.1-8B-RoleMRC-DPO 0.1056 0.3989 0.1785 0.2988 0.3001 0.4051 0.8805

General Benchmark

Model GSM8K 8-shot Math 4-shot GPQA 0-shot IFEval 3-shot MMLU-Pro 5-shot MMLU 0-shot PiQA 3-shot MUSR 0-shot TruthfulQA 3-shot / Avg.
LLAMA3.1-8B 48.98 17.78 12.5 16.67 35.21 63.27 81.77 38.1 28.52 38.09
LLAMA3.1-8B-INSTRUCT 77.41 34.1 12.72 57.67 40.77 68.1 82.1 39.81 36.47 49.91
LLaMA3.1-8B-RoleMRC-SFT 56.18 12.78 19.64 42.09 31.58 59.3 82.64 40.34 35.01 42.17
LLaMA3.1-8B-RoleMRC-DPO 58.53 13.5 20.09 46.64 31.8 59.96 82.7 39.42 37.33 43.33

Evaluation Details

Five conditional benchmarks, using lm-evaluation-harness:

  • GSM8K: 8-shot, report strict match
  • IFEval: 3-shot, report instruction-level strict accuracy
  • PiQA: 3-shot, report accuracy
  • MMLU: 0-shot, report normalized accuracy
  • TruthfulQA: 3-shot, report accuracy of single-true mc1 setting

Input Format

The model is trained to use the following format:

<|start_header_id|>user<|end_header_id|>

{PROMPT}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

{Response}

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-5
  • total_train_batch_size: 16
  • optimizer: AdamW with beta1 0.9, beta2 0.999 and epsilon 1e-8
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.04
  • num_epochs: 1.0
Downloads last month
8
Safetensors
Model size
8.03B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for jiazhengli/Llama-3.1-8B-RoleMRC-sft

Finetuned
(920)
this model
Finetunes
1 model

Dataset used to train jiazhengli/Llama-3.1-8B-RoleMRC-sft