flydust commited on
Commit
9299d8e
·
verified ·
1 Parent(s): ced620b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -19
README.md CHANGED
@@ -5,23 +5,14 @@ tags:
5
  - trl
6
  - dpo
7
  - generated_from_trainer
8
- - trl
9
- - dpo
10
- - generated_from_trainer
11
  datasets:
12
- - flydust/llama3-ultrafeedback-armorm-2
13
  model-index:
14
- - name: Llama-3.1-8B-Magpie-Pro-MTR-UltraDPO-1
15
  results: []
16
  ---
17
 
18
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
- should probably proofread and complete it, then remove this comment. -->
20
-
21
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/uw-nsl/huggingface/runs/ro30b4xx)
22
- # Llama-3.1-8B-Magpie-Pro-MTR-UltraDPO-1
23
-
24
- This model is a fine-tuned version of [Magpie-Align/Llama-3.1-8B-Magpie-Mix-300KMT-150KR](https://huggingface.co/Magpie-Align/Llama-3.1-8B-Magpie-Mix-300KMT-150KR) on the flydust/llama3-ultrafeedback-armorm-2 dataset.
25
  It achieves the following results on the evaluation set:
26
  - Loss: 0.3290
27
  - Rewards/chosen: -4.8185
@@ -35,15 +26,13 @@ It achieves the following results on the evaluation set:
35
 
36
  ## Model description
37
 
38
- More information needed
39
-
40
- ## Intended uses & limitations
41
-
42
- More information needed
43
 
44
- ## Training and evaluation data
45
 
46
- More information needed
 
 
47
 
48
  ## Training procedure
49
 
 
5
  - trl
6
  - dpo
7
  - generated_from_trainer
 
 
 
8
  datasets:
9
+ - princeton-nlp/llama3-ultrafeedback-armorm
10
  model-index:
11
+ - name: Llama-3.1-8B-Magpie-Align-v0.1-RC1
12
  results: []
13
  ---
14
 
15
+ This model is a fine-tuned version of [Magpie-Align/Llama-3.1-8B-Magpie-Align-SFT-v0.1](https://huggingface.co/Magpie-Align/Llama-3.1-8B-Magpie-Align-SFT-v0.1) on the princeton-nlp/llama3-ultrafeedback-armorm dataset.
 
 
 
 
 
 
16
  It achieves the following results on the evaluation set:
17
  - Loss: 0.3290
18
  - Rewards/chosen: -4.8185
 
26
 
27
  ## Model description
28
 
29
+ More details will be added soon.
 
 
 
 
30
 
31
+ ## Benchmark
32
 
33
+ - **MT-Bench: 8.375 (1st Turn), 7.650 (Second Turn), 8.013 (Average)**
34
+ - **Alpaca Eval 2 (GPT-4-Turbo-1106): 45.73 (LC), 52.79 (WR)**
35
+ - **Arena Hard: 42.4**
36
 
37
  ## Training procedure
38