PEFT
Safetensors
qwen2
axolotl
Generated from Trainer
lbourdois commited on
Commit
2942b15
·
verified ·
1 Parent(s): 31052d0

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +159 -145
README.md CHANGED
@@ -1,146 +1,160 @@
1
- ---
2
- library_name: peft
3
- license: other
4
- base_model: Qwen/Qwen2.5-72B
5
- tags:
6
- - axolotl
7
- - generated_from_trainer
8
- datasets:
9
- - sumuks/openreview_wintermute_0.2_training_data
10
- model-index:
11
- - name: purple-wintermute-0.2-72b
12
- results: []
13
- ---
14
-
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
-
18
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
19
- <details><summary>See axolotl config</summary>
20
-
21
- axolotl version: `0.6.0`
22
- ```yaml
23
- base_model: Qwen/Qwen2.5-72B
24
- hub_model_id: sumuks/purple-wintermute-0.2-72b
25
- trust_remote_code: true
26
-
27
- load_in_8bit: false
28
- load_in_4bit: false
29
- strict: false
30
- bf16: true
31
- hf_use_auth_token: true
32
-
33
- plugins:
34
- - axolotl.integrations.liger.LigerPlugin
35
- liger_rope: true
36
- liger_rms_norm: true
37
- liger_glu_activation: true
38
- liger_layer_norm: true
39
- liger_fused_linear_cross_entropy: true
40
- save_safetensors:
41
-
42
- datasets:
43
- - path: sumuks/openreview_wintermute_0.2_training_data
44
- type: completion
45
- field: text
46
- dataset_prepared_path: .axolotl_cache_data/wintermute_0.2
47
- shuffle_merged_datasets: true
48
- # dataset_exact_deduplication: true
49
- val_set_size: 0.005
50
- output_dir: ./../../outputs/purple-wintermute-0.2-72b
51
- push_dataset_to_hub: sumuks/purple_wintermute_0.2_training_data_in_progress
52
-
53
- sequence_length: 2048
54
- sample_packing: true
55
- pad_to_sequence_len: true
56
-
57
- adapter: lora
58
- lora_r: 256
59
- lora_alpha: 32
60
- lora_dropout: 0.05
61
- peft_use_rslora: true
62
- lora_target_linear: true
63
-
64
- gradient_accumulation_steps: 4
65
- micro_batch_size: 8
66
- eval_batch_size: 1
67
- num_epochs: 3
68
- learning_rate: 5e-5
69
- warmup_ratio: 0.05
70
- evals_per_epoch: 3
71
- saves_per_epoch: 5
72
- gradient_checkpointing: true
73
- lr_scheduler: cosine
74
- optimizer: paged_adamw_8bit
75
-
76
- profiler_steps: 100
77
- save_safetensors: true
78
- train_on_inputs: true
79
- wandb_project: wintermute
80
- wandb_name: purple-wintermute-0.2-72b
81
- deepspeed: deepspeed_configs/zero3_bf16.json
82
-
83
- ```
84
-
85
- </details><br>
86
-
87
- # purple-wintermute-0.2-72b
88
-
89
- This model is a fine-tuned version of [Qwen/Qwen2.5-72B](https://huggingface.co/Qwen/Qwen2.5-72B) on the sumuks/openreview_wintermute_0.2_training_data dataset.
90
- It achieves the following results on the evaluation set:
91
- - Loss: 1.3017
92
-
93
- ## Model description
94
-
95
- More information needed
96
-
97
- ## Intended uses & limitations
98
-
99
- More information needed
100
-
101
- ## Training and evaluation data
102
-
103
- More information needed
104
-
105
- ## Training procedure
106
-
107
- ### Training hyperparameters
108
-
109
- The following hyperparameters were used during training:
110
- - learning_rate: 5e-05
111
- - train_batch_size: 8
112
- - eval_batch_size: 1
113
- - seed: 42
114
- - distributed_type: multi-GPU
115
- - num_devices: 8
116
- - gradient_accumulation_steps: 4
117
- - total_train_batch_size: 256
118
- - total_eval_batch_size: 8
119
- - optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
120
- - lr_scheduler_type: cosine
121
- - lr_scheduler_warmup_steps: 388
122
- - num_epochs: 3
123
-
124
- ### Training results
125
-
126
- | Training Loss | Epoch | Step | Validation Loss |
127
- |:-------------:|:------:|:----:|:---------------:|
128
- | No log | 0.0004 | 1 | 2.5112 |
129
- | 1.3654 | 0.3333 | 864 | 1.6504 |
130
- | 0.9929 | 0.6665 | 1728 | 1.4144 |
131
- | 0.9039 | 0.9998 | 2592 | 1.3083 |
132
- | 0.8161 | 1.3333 | 3456 | 1.2935 |
133
- | 0.7815 | 1.6665 | 4320 | 1.2816 |
134
- | 0.7658 | 1.9998 | 5184 | 1.2775 |
135
- | 0.7004 | 2.3333 | 6048 | 1.2995 |
136
- | 0.6694 | 2.6665 | 6912 | 1.3013 |
137
- | 0.6798 | 2.9998 | 7776 | 1.3017 |
138
-
139
-
140
- ### Framework versions
141
-
142
- - PEFT 0.14.0
143
- - Transformers 4.47.1
144
- - Pytorch 2.5.1
145
- - Datasets 3.2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
146
  - Tokenizers 0.21.0
 
1
+ ---
2
+ library_name: peft
3
+ license: other
4
+ base_model: Qwen/Qwen2.5-72B
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ datasets:
9
+ - sumuks/openreview_wintermute_0.2_training_data
10
+ language:
11
+ - zho
12
+ - eng
13
+ - fra
14
+ - spa
15
+ - por
16
+ - deu
17
+ - ita
18
+ - rus
19
+ - jpn
20
+ - kor
21
+ - vie
22
+ - tha
23
+ - ara
24
+ model-index:
25
+ - name: purple-wintermute-0.2-72b
26
+ results: []
27
+ ---
28
+
29
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
30
+ should probably proofread and complete it, then remove this comment. -->
31
+
32
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
33
+ <details><summary>See axolotl config</summary>
34
+
35
+ axolotl version: `0.6.0`
36
+ ```yaml
37
+ base_model: Qwen/Qwen2.5-72B
38
+ hub_model_id: sumuks/purple-wintermute-0.2-72b
39
+ trust_remote_code: true
40
+
41
+ load_in_8bit: false
42
+ load_in_4bit: false
43
+ strict: false
44
+ bf16: true
45
+ hf_use_auth_token: true
46
+
47
+ plugins:
48
+ - axolotl.integrations.liger.LigerPlugin
49
+ liger_rope: true
50
+ liger_rms_norm: true
51
+ liger_glu_activation: true
52
+ liger_layer_norm: true
53
+ liger_fused_linear_cross_entropy: true
54
+ save_safetensors:
55
+
56
+ datasets:
57
+ - path: sumuks/openreview_wintermute_0.2_training_data
58
+ type: completion
59
+ field: text
60
+ dataset_prepared_path: .axolotl_cache_data/wintermute_0.2
61
+ shuffle_merged_datasets: true
62
+ # dataset_exact_deduplication: true
63
+ val_set_size: 0.005
64
+ output_dir: ./../../outputs/purple-wintermute-0.2-72b
65
+ push_dataset_to_hub: sumuks/purple_wintermute_0.2_training_data_in_progress
66
+
67
+ sequence_length: 2048
68
+ sample_packing: true
69
+ pad_to_sequence_len: true
70
+
71
+ adapter: lora
72
+ lora_r: 256
73
+ lora_alpha: 32
74
+ lora_dropout: 0.05
75
+ peft_use_rslora: true
76
+ lora_target_linear: true
77
+
78
+ gradient_accumulation_steps: 4
79
+ micro_batch_size: 8
80
+ eval_batch_size: 1
81
+ num_epochs: 3
82
+ learning_rate: 5e-5
83
+ warmup_ratio: 0.05
84
+ evals_per_epoch: 3
85
+ saves_per_epoch: 5
86
+ gradient_checkpointing: true
87
+ lr_scheduler: cosine
88
+ optimizer: paged_adamw_8bit
89
+
90
+ profiler_steps: 100
91
+ save_safetensors: true
92
+ train_on_inputs: true
93
+ wandb_project: wintermute
94
+ wandb_name: purple-wintermute-0.2-72b
95
+ deepspeed: deepspeed_configs/zero3_bf16.json
96
+
97
+ ```
98
+
99
+ </details><br>
100
+
101
+ # purple-wintermute-0.2-72b
102
+
103
+ This model is a fine-tuned version of [Qwen/Qwen2.5-72B](https://huggingface.co/Qwen/Qwen2.5-72B) on the sumuks/openreview_wintermute_0.2_training_data dataset.
104
+ It achieves the following results on the evaluation set:
105
+ - Loss: 1.3017
106
+
107
+ ## Model description
108
+
109
+ More information needed
110
+
111
+ ## Intended uses & limitations
112
+
113
+ More information needed
114
+
115
+ ## Training and evaluation data
116
+
117
+ More information needed
118
+
119
+ ## Training procedure
120
+
121
+ ### Training hyperparameters
122
+
123
+ The following hyperparameters were used during training:
124
+ - learning_rate: 5e-05
125
+ - train_batch_size: 8
126
+ - eval_batch_size: 1
127
+ - seed: 42
128
+ - distributed_type: multi-GPU
129
+ - num_devices: 8
130
+ - gradient_accumulation_steps: 4
131
+ - total_train_batch_size: 256
132
+ - total_eval_batch_size: 8
133
+ - optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
134
+ - lr_scheduler_type: cosine
135
+ - lr_scheduler_warmup_steps: 388
136
+ - num_epochs: 3
137
+
138
+ ### Training results
139
+
140
+ | Training Loss | Epoch | Step | Validation Loss |
141
+ |:-------------:|:------:|:----:|:---------------:|
142
+ | No log | 0.0004 | 1 | 2.5112 |
143
+ | 1.3654 | 0.3333 | 864 | 1.6504 |
144
+ | 0.9929 | 0.6665 | 1728 | 1.4144 |
145
+ | 0.9039 | 0.9998 | 2592 | 1.3083 |
146
+ | 0.8161 | 1.3333 | 3456 | 1.2935 |
147
+ | 0.7815 | 1.6665 | 4320 | 1.2816 |
148
+ | 0.7658 | 1.9998 | 5184 | 1.2775 |
149
+ | 0.7004 | 2.3333 | 6048 | 1.2995 |
150
+ | 0.6694 | 2.6665 | 6912 | 1.3013 |
151
+ | 0.6798 | 2.9998 | 7776 | 1.3017 |
152
+
153
+
154
+ ### Framework versions
155
+
156
+ - PEFT 0.14.0
157
+ - Transformers 4.47.1
158
+ - Pytorch 2.5.1
159
+ - Datasets 3.2.0
160
  - Tokenizers 0.21.0