Apel-sin commited on
Commit
edd90cc
Β·
1 Parent(s): 82c43f2

add measurement.json

Browse files
Files changed (2) hide show
  1. README.md +209 -0
  2. measurement.json +0 -0
README.md ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - generated_from_trainer
5
+ base_model: lightblue/suzume-llama-3-8B-multilingual
6
+ model-index:
7
+ - name: workspace/llm_training/axolotl/llama3-multilingual-orpo/output_mitsu_half_borda
8
+ results: []
9
+ ---
10
+ # Suzume ORPO
11
+
12
+ <p align="center">
13
+ <img width=500 src="https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/kWQSu02YfgYdUQqv4s5lq.png" alt="Suzume with Mitsu - a Japanese tree sparrow with honey on it"/>
14
+ </p>
15
+
16
+ [[Paper]](https://arxiv.org/abs/2405.18952) [[Dataset]](https://huggingface.co/datasets/lightblue/mitsu)
17
+
18
+ This is Suzume ORPO, an ORPO trained fine-tune of the [lightblue/suzume-llama-3-8B-multilingual](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual) model using our [lightblue/mitsu](https://huggingface.co/datasets/lightblue/mitsu) dataset.
19
+
20
+ We have trained several versions of this model using ORPO and so recommend that you use the best performing model from our tests, [lightblue/suzume-llama-3-8B-multilingual-orpo-borda-half](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-half).
21
+
22
+ Note that this model has a non-commerical license as we used the Command R and Command R+ models to generate our training data for this model ([lightblue/mitsu](https://huggingface.co/datasets/lightblue/mitsu)).
23
+
24
+ We are currently working on a developing a commerically usable model, so stay tuned for that!
25
+
26
+ # Model list
27
+
28
+ We have ORPO trained the following models using different proportions of the [lightblue/mitsu](https://huggingface.co/datasets/lightblue/mitsu) dataset:
29
+ * Trained on the top/bottom responses of all prompts in the dataset: [lightblue/suzume-llama-3-8B-multilingual-orpo-borda-full](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-full)
30
+ * Trained on the top/bottom responses of the prompts of the 75\% most consistently ranked responses in the dataset: [lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top75](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top75)
31
+ * Trained on the top/bottom responses of the prompts of the 50\% most consistently ranked responses in the dataset: [lightblue/suzume-llama-3-8B-multilingual-orpo-borda-half](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-half)
32
+ * Trained on the top/bottom responses of the prompts of the 25\% most consistently ranked responses in the dataset: [lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top25](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top25)
33
+
34
+ # Model results
35
+
36
+ We compare the MT-Bench scores across 6 languages for our 4 ORPO trained models, as well as some baselines:
37
+
38
+ * [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) - The foundation model that our models are ultimately built upon
39
+ * [Nexusflow/Starling-LM-7B-beta](https://huggingface.co/Nexusflow/Starling-LM-7B-beta) - The highest performing open model on the Chatbot arena that is of a similar size to ours
40
+ * gpt-3.5-turbo - A fairly high quality (although not state-of-the-art) proprietary LLM
41
+ * [lightblue/suzume-llama-3-8B-multilingual](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual) - The base model which we train our ORPO finetunes from
42
+
43
+ | **MT-Bench language** | **meta-llama/Meta-Llama-3-8B-Instruct** | **Nexusflow/Starling-LM-7B-beta** | **gpt-3.5-turbo** | **lightblue/suzume-llama-3-8B-multilingual** | **lightblue/suzume-llama-3-8B-multilingual-orpo-borda-full** | **lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top75** | **lightblue/suzume-llama-3-8B-multilingual-orpo-borda-half** | **lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top25** |
44
+ |-----------------------|-----------------------------------------|-----------------------------------|-------------------|----------------------------------------------|--------------------------------------------------------------|---------------------------------------------------------------|--------------------------------------------------------------|---------------------------------------------------------------|
45
+ | **Chinese πŸ‡¨πŸ‡³** | NaN | 6.97 | 7.55 | 7.11 | 7.65 | **7.77** | 7.74 | 7.44 |
46
+ | **English πŸ‡ΊπŸ‡Έ** | 7.98 | 7.92 | **8.26** | 7.73 | 7.98 | 7.94 | 7.98 | 8.22 |
47
+ | **French πŸ‡«πŸ‡·** | NaN | 7.29 | 7.74 | 7.66 | **7.84** | 7.46 | 7.78 | 7.81 |
48
+ | **German πŸ‡©πŸ‡ͺ** | NaN | 6.99 | 7.68 | 7.26 | 7.28 | 7.64 | 7.7 | **7.71** |
49
+ | **Japanese πŸ‡―πŸ‡΅** | NaN | 6.22 | **7.84** | 6.56 | 7.2 | 7.12 | 7.34 | 7.04 |
50
+ | **Russian πŸ‡·πŸ‡Ί** | NaN | 8.28 | 7.94 | 8.19 | 8.3 | 8.74 | **8.94** | 8.81 |
51
+
52
+ We can see noticable improvement on most languages compared to the base model. We also find that our ORPO models achieve the highest score out of all the models we evaluated for a number of languages.
53
+
54
+ # Training data
55
+
56
+ We trained this model using the [lightblue/mitsu_full_borda](https://huggingface.co/datasets/lightblue/mitsu_full_borda) dataset.
57
+
58
+ # Training configuration
59
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
60
+ should probably proofread and complete it, then remove this comment. -->
61
+
62
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
63
+ <details><summary>See axolotl config</summary>
64
+
65
+ axolotl version: `0.4.0`
66
+ ```yaml
67
+ base_model: lightblue/suzume-llama-3-8B-multilingual
68
+ model_type: LlamaForCausalLM
69
+ tokenizer_type: AutoTokenizer # PreTrainedTokenizerFast
70
+
71
+ load_in_8bit: false
72
+ load_in_4bit: false
73
+ strict: false
74
+
75
+ rl: orpo
76
+ orpo_alpha: 0.1
77
+ remove_unused_columns: false
78
+
79
+ chat_template: chatml
80
+ datasets:
81
+ - path: lightblue/mitsu_tophalf_borda
82
+ type: orpo.chat_template
83
+ conversation: llama-3
84
+ dataset_prepared_path: /workspace/llm_training/axolotl/llama3-multilingual-orpo/prepared_mitsu_half_borda
85
+ val_set_size: 0.02
86
+ output_dir: /workspace/llm_training/axolotl/llama3-multilingual-orpo/output_mitsu_half_borda
87
+
88
+ sequence_len: 8192
89
+ sample_packing: false
90
+ pad_to_sequence_len: true
91
+
92
+ use_wandb: true
93
+ wandb_project: axolotl
94
+ wandb_entity: peterd
95
+ wandb_name: mitsu_half_borda
96
+
97
+ gradient_accumulation_steps: 8
98
+ micro_batch_size: 1
99
+ num_epochs: 1
100
+ optimizer: paged_adamw_8bit
101
+ lr_scheduler: cosine
102
+ learning_rate: 8e-6
103
+
104
+ train_on_inputs: false
105
+ group_by_length: false
106
+ bf16: auto
107
+ fp16:
108
+ tf32: false
109
+
110
+ gradient_checkpointing: true
111
+ gradient_checkpointing_kwargs:
112
+ use_reentrant: false
113
+ early_stopping_patience:
114
+ resume_from_checkpoint:
115
+ logging_steps: 1
116
+ xformers_attention:
117
+ flash_attention: true
118
+
119
+ warmup_steps: 10
120
+ evals_per_epoch: 20
121
+ eval_table_size:
122
+ saves_per_epoch: 1
123
+ debug:
124
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
125
+ weight_decay: 0.0
126
+ special_tokens:
127
+ pad_token: <|end_of_text|>
128
+ ```
129
+
130
+ </details><br>
131
+
132
+ # workspace/llm_training/axolotl/llama3-multilingual-orpo/output_mitsu_half_borda
133
+
134
+ This model is a fine-tuned version of [lightblue/suzume-llama-3-8B-multilingual](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual) on the None dataset.
135
+ It achieves the following results on the evaluation set:
136
+ - Loss: 0.0935
137
+
138
+ ## Model description
139
+
140
+ More information needed
141
+
142
+ ## Intended uses & limitations
143
+
144
+ More information needed
145
+
146
+ ## Training and evaluation data
147
+
148
+ More information needed
149
+
150
+ ## Training procedure
151
+
152
+ ### Training hyperparameters
153
+
154
+ The following hyperparameters were used during training:
155
+ - learning_rate: 8e-06
156
+ - train_batch_size: 1
157
+ - eval_batch_size: 1
158
+ - seed: 42
159
+ - distributed_type: multi-GPU
160
+ - num_devices: 4
161
+ - gradient_accumulation_steps: 8
162
+ - total_train_batch_size: 32
163
+ - total_eval_batch_size: 4
164
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
165
+ - lr_scheduler_type: cosine
166
+ - lr_scheduler_warmup_steps: 10
167
+ - num_epochs: 1
168
+
169
+ ### Training results
170
+
171
+ | Training Loss | Epoch | Step | Validation Loss |
172
+ |:-------------:|:-----:|:----:|:---------------:|
173
+ | 7.6299 | 0.02 | 1 | 7.7014 |
174
+ | 7.041 | 0.07 | 3 | 3.9786 |
175
+ | 0.6089 | 0.15 | 6 | 0.1393 |
176
+ | 0.1308 | 0.22 | 9 | 0.1244 |
177
+ | 0.1051 | 0.29 | 12 | 0.1112 |
178
+ | 0.1021 | 0.36 | 15 | 0.1063 |
179
+ | 0.0861 | 0.44 | 18 | 0.1026 |
180
+ | 0.1031 | 0.51 | 21 | 0.0979 |
181
+ | 0.0996 | 0.58 | 24 | 0.0967 |
182
+ | 0.0923 | 0.65 | 27 | 0.0960 |
183
+ | 0.1025 | 0.73 | 30 | 0.0944 |
184
+ | 0.1103 | 0.8 | 33 | 0.0939 |
185
+ | 0.0919 | 0.87 | 36 | 0.0937 |
186
+ | 0.104 | 0.94 | 39 | 0.0935 |
187
+
188
+
189
+ ### Framework versions
190
+
191
+ - Transformers 4.38.2
192
+ - Pytorch 2.2.1+cu121
193
+ - Datasets 2.18.0
194
+ - Tokenizers 0.15.0
195
+
196
+ # How to cite
197
+
198
+ ```tex
199
+ @article{devine2024sure,
200
+ title={Are You Sure? Rank Them Again: Repeated Ranking For Better Preference Datasets},
201
+ author={Devine, Peter},
202
+ journal={arXiv preprint arXiv:2405.18952},
203
+ year={2024}
204
+ }
205
+ ```
206
+
207
+ # Developer
208
+
209
+ Peter Devine - ([ptrdvn](https://huggingface.co/ptrdvn))
measurement.json ADDED
The diff for this file is too large to render. See raw diff