Crystalcareai commited on
Commit
5f0d90f
·
verified ·
1 Parent(s): 7c8b30c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +535 -0
README.md ADDED
@@ -0,0 +1,535 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: 01-ai/Yi-1.5-34B-32k
4
+ tags:
5
+ - generated_from_trainer
6
+ - axolotl
7
+ datasets:
8
+ - cognitivecomputations/Dolphin-2.9
9
+ - teknium/OpenHermes-2.5
10
+ - m-a-p/CodeFeedback-Filtered-Instruction
11
+ - cognitivecomputations/dolphin-coder
12
+ - cognitivecomputations/samantha-data
13
+ - microsoft/orca-math-word-problems-200k
14
+ - Locutusque/function-calling-chatml
15
+ - internlm/Agent-FLAN
16
+ ---
17
+
18
+ # Dolphin 2.9.3 Yi 1.5 34b 32k 🐬
19
+
20
+ Curated and trained by Eric Hartford, Lucas Atkins, and Fernando Fernandes, and Cognitive Computations
21
+
22
+ [![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/cognitivecomputations)
23
+ Discord: https://discord.gg/cognitivecomputations
24
+
25
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/ldkN1J0WIDQwU4vutGYiD.png" width="600" />
26
+
27
+ Our appreciation for the sponsors of Dolphin 2.9.3:
28
+ - [Crusoe Cloud](https://crusoe.ai/) - provided excellent on-demand 8xH100 node
29
+ - [OnDemand](https://on-demand.io/) - provided inference sponsorship
30
+
31
+ This model is based on Yi-1.5-34b-32k, and is governed by apache 2.0 license.
32
+
33
+ The base model has 32k context, and our finetuning took place with 8192 sequence length.
34
+
35
+ Dolphin 2.9.3 uses ChatML prompt template format.
36
+
37
+ example:
38
+
39
+ ```
40
+ <|im_start|>system
41
+ You are Dolphin, a helpful AI assistant.<|im_end|>
42
+ <|im_start|>user
43
+ {prompt}<|im_end|>
44
+ <|im_start|>assistant
45
+
46
+ ```
47
+
48
+ Dolphin-2.9.3 has a variety of instruction following, conversational, and coding skills. It also has initial agentic abilities and supports function calling.
49
+
50
+ Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
51
+
52
+ Dolphin is licensed according to apache 2.0 license. We grant permission for any use, including commercial. Dolphin was trained on data generated from GPT4, among other models.
53
+
54
+ ## Evals
55
+
56
+ ![image/png](https://i.ibb.co/7G02dNq/file-9-Lfkfpd0-KKK3-USTm-U8d-Jg-Zm0.png)
57
+
58
+ ## Training
59
+
60
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
61
+ <details><summary>See axolotl config</summary>
62
+
63
+ axolotl version: `0.4.0`
64
+ ```yaml
65
+ base_model: 01-ai/Yi-1.5-34B-32k
66
+ model_type: LlamaForCausalLM
67
+ tokenizer_type: LlamaTokenizer
68
+ trust_remote_code: true
69
+
70
+ # load_in_8bit: false
71
+ load_in_4bit: true
72
+ # strict: false
73
+
74
+ adapter: qlora
75
+ lora_modules_to_save: [embed_tokens, lm_head]
76
+
77
+ lora_r: 32
78
+ lora_alpha: 16
79
+ lora_dropout: 0.05
80
+ lora_target_linear: false
81
+ lora_fan_in_fan_out:
82
+
83
+ datasets:
84
+ - path: /workspace/datasets/dolphin-2.9.3/dolphin201-sharegpt2.jsonl
85
+ type: sharegpt
86
+ conversation: chatml
87
+ - path: /workspace/datasets/dolphin-2.9.3/SystemChat_filtered_sharegpt.jsonl
88
+ type: sharegpt
89
+ conversation: chatml
90
+ - path: /workspace/datasets/dolphin-2.9.3/SystemChat_multilingual_sharegpt.jsonl
91
+ type: sharegpt
92
+ conversation: chatml
93
+ - path: /workspace/datasets/dolphin-2.9.3/dolphin-coder-translate-sharegpt2.jsonl
94
+ type: sharegpt
95
+ conversation: chatml
96
+ - path: /workspace/datasets/dolphin-2.9.3/dolphin-coder-codegen-sharegpt2.jsonl
97
+ type: sharegpt
98
+ conversation: chatml
99
+ - path: /workspace/datasets/dolphin-2.9.3/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
100
+ type: sharegpt
101
+ conversation: chatml
102
+ - path: /workspace/datasets/dolphin-2.9.3/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
103
+ type: sharegpt
104
+ conversation: chatml
105
+ - path: /workspace/datasets/dolphin-2.9.3/not_samantha_norefusals.jsonl
106
+ type: sharegpt
107
+ conversation: chatml
108
+ - path: /workspace/datasets/dolphin-2.9.3/Orca-Math-resort-unfiltered.jsonl
109
+ type: sharegpt
110
+ conversation: chatml
111
+ - path: /workspace/datasets/dolphin-2.9.3/agent_instruct_react_unfiltered.jsonl
112
+ type: sharegpt
113
+ conversation: chatml
114
+ - path: /workspace/datasets/dolphin-2.9.3/toolbench_instruct_j1s1_3k_unfiltered.jsonl
115
+ type: sharegpt
116
+ conversation: chatml
117
+ - path: /workspace/datasets/dolphin-2.9.3/toolbench_negative_unfiltered.jsonl
118
+ type: sharegpt
119
+ conversation: chatml
120
+ - path: /workspace/datasets/dolphin-2.9.3/toolbench_react_10p_unfiltered.jsonl
121
+ type: sharegpt
122
+ conversation: chatml
123
+ - path: /workspace/datasets/dolphin-2.9.3/toolbench_tflan_cot_30p_unfiltered.jsonl
124
+ type: sharegpt
125
+ conversation: chatml
126
+ - path: /workspace/datasets/dolphin-2.9.3/openhermes200k_unfiltered.jsonl
127
+ type: sharegpt
128
+ conversation: chatml
129
+
130
+ chat_template: chatml
131
+
132
+ dataset_prepared_path: dolphin-2.9.3-yi34b-prepared
133
+ val_set_size: 0.01
134
+ output_dir: ./dolphin-2.9.3-out
135
+
136
+ sequence_len: 8192
137
+ sample_packing: true
138
+ pad_to_sequence_len: true
139
+
140
+ wandb_project: dolphin-2.9.3-yi-1.5-34b
141
+ wandb_watch:
142
+ wandb_run_id:
143
+ wandb_log_model:
144
+
145
+ gradient_accumulation_steps: 8
146
+ micro_batch_size: 1
147
+ num_epochs: 3
148
+ optimizer: adamw_8bit
149
+ lr_scheduler: cosine
150
+ learning_rate: 1e-5
151
+
152
+ train_on_inputs: false
153
+ group_by_length: false
154
+ bf16: auto
155
+ fp16:
156
+ tf32: false
157
+
158
+ gradient_checkpointing: true
159
+ gradient_checkpointing_kwargs:
160
+ use_reentrant: false
161
+ early_stopping_patience:
162
+ logging_steps: 1
163
+ xformers_attention:
164
+ flash_attention: true
165
+
166
+ warmup_steps: 10
167
+ # evals_per_epoch: 4
168
+ eval_table_size:
169
+ saves_per_epoch: 4
170
+ save_total_limit: 2
171
+ save_steps:
172
+ debug:
173
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16.json
174
+ weight_decay: 0.05
175
+ fsdp:
176
+ fsdp_config:
177
+ special_tokens:
178
+ bos_token: "<|startoftext|>"
179
+ eos_token: "<|im_end|>"
180
+ pad_token: "<unk>"
181
+ unk_token: "<unk>"
182
+ tokens:
183
+ - "<|im_start|>"
184
+
185
+ #unfrozen_parameters:
186
+ lora_target_modules:
187
+ # input_layernorm layers
188
+ # - model.layers.0.input_layernorm
189
+ # - model.layers.1.input_layernorm
190
+ # - model.layers.2.input_layernorm
191
+ # - model.layers.3.input_layernorm
192
+ # - model.layers.4.input_layernorm
193
+ # - model.layers.5.input_layernorm
194
+ # - model.layers.6.input_layernorm
195
+ # - model.layers.7.input_layernorm
196
+ # - model.layers.8.input_layernorm
197
+ # - model.layers.9.input_layernorm
198
+ # - model.layers.10.input_layernorm
199
+ # - model.layers.11.input_layernorm
200
+ # - model.layers.12.input_layernorm
201
+ # - model.layers.13.input_layernorm
202
+ # - model.layers.14.input_layernorm
203
+ # - model.layers.15.input_layernorm
204
+ # - model.layers.16.input_layernorm
205
+ # - model.layers.17.input_layernorm
206
+ # - model.layers.18.input_layernorm
207
+ # - model.layers.19.input_layernorm
208
+ # - model.layers.20.input_layernorm
209
+ # - model.layers.21.input_layernorm
210
+ # - model.layers.22.input_layernorm
211
+ # - model.layers.23.input_layernorm
212
+ # - model.layers.24.input_layernorm
213
+ # - model.layers.25.input_layernorm
214
+ # - model.layers.26.input_layernorm
215
+ # - model.layers.27.input_layernorm
216
+ # - model.layers.28.input_layernorm
217
+ # - model.layers.29.input_layernorm
218
+ - lm_head
219
+ # mlp.down_proj layers
220
+ - model.layers.44.mlp.down_proj
221
+ - model.layers.45.mlp.down_proj
222
+ - model.layers.46.mlp.down_proj
223
+ - model.layers.47.mlp.down_proj
224
+ - model.layers.43.mlp.down_proj
225
+ - model.layers.48.mlp.down_proj
226
+ - model.layers.49.mlp.down_proj
227
+ - model.layers.42.mlp.down_proj
228
+ - model.layers.50.mlp.down_proj
229
+ - model.layers.41.mlp.down_proj
230
+ - model.layers.51.mlp.down_proj
231
+ - model.layers.52.mlp.down_proj
232
+ - model.layers.39.mlp.down_proj
233
+ - model.layers.40.mlp.down_proj
234
+ - model.layers.53.mlp.down_proj
235
+ - model.layers.54.mlp.down_proj
236
+ - model.layers.38.mlp.down_proj
237
+ - model.layers.56.mlp.down_proj
238
+ - model.layers.55.mlp.down_proj
239
+ - model.layers.37.mlp.down_proj
240
+ - model.layers.36.mlp.down_proj
241
+ - model.layers.57.mlp.down_proj
242
+ - model.layers.35.mlp.down_proj
243
+ - model.layers.12.mlp.down_proj
244
+ - model.layers.13.mlp.down_proj
245
+ - model.layers.16.mlp.down_proj
246
+ - model.layers.14.mlp.down_proj
247
+ - model.layers.11.mlp.down_proj
248
+ - model.layers.34.mlp.down_proj
249
+ - model.layers.17.mlp.down_proj
250
+ # mlp.gate_proj layers
251
+ - model.layers.57.mlp.gate_proj
252
+ - model.layers.58.mlp.gate_proj
253
+ - model.layers.56.mlp.gate_proj
254
+ - model.layers.55.mlp.gate_proj
255
+ - model.layers.54.mlp.gate_proj
256
+ - model.layers.35.mlp.gate_proj
257
+ - model.layers.34.mlp.gate_proj
258
+ - model.layers.53.mlp.gate_proj
259
+ - model.layers.26.mlp.gate_proj
260
+ - model.layers.52.mlp.gate_proj
261
+ - model.layers.25.mlp.gate_proj
262
+ - model.layers.33.mlp.gate_proj
263
+ - model.layers.51.mlp.gate_proj
264
+ - model.layers.18.mlp.gate_proj
265
+ - model.layers.32.mlp.gate_proj
266
+ - model.layers.36.mlp.gate_proj
267
+ - model.layers.24.mlp.gate_proj
268
+ - model.layers.17.mlp.gate_proj
269
+ - model.layers.23.mlp.gate_proj
270
+ - model.layers.31.mlp.gate_proj
271
+ - model.layers.50.mlp.gate_proj
272
+ - model.layers.19.mlp.gate_proj
273
+ - model.layers.15.mlp.gate_proj
274
+ - model.layers.27.mlp.gate_proj
275
+ - model.layers.37.mlp.gate_proj
276
+ - model.layers.14.mlp.gate_proj
277
+ - model.layers.39.mlp.gate_proj
278
+ - model.layers.11.mlp.gate_proj
279
+ - model.layers.29.mlp.gate_proj
280
+ - model.layers.28.mlp.gate_proj
281
+ # mlp.up_proj layers
282
+ - model.layers.21.mlp.up_proj
283
+ - model.layers.48.mlp.up_proj
284
+ - model.layers.49.mlp.up_proj
285
+ - model.layers.24.mlp.up_proj
286
+ - model.layers.47.mlp.up_proj
287
+ - model.layers.25.mlp.up_proj
288
+ - model.layers.23.mlp.up_proj
289
+ - model.layers.50.mlp.up_proj
290
+ - model.layers.14.mlp.up_proj
291
+ - model.layers.46.mlp.up_proj
292
+ - model.layers.26.mlp.up_proj
293
+ - model.layers.27.mlp.up_proj
294
+ - model.layers.20.mlp.up_proj
295
+ - model.layers.13.mlp.up_proj
296
+ - model.layers.51.mlp.up_proj
297
+ - model.layers.28.mlp.up_proj
298
+ - model.layers.45.mlp.up_proj
299
+ - model.layers.22.mlp.up_proj
300
+ - model.layers.52.mlp.up_proj
301
+ - model.layers.12.mlp.up_proj
302
+ - model.layers.29.mlp.up_proj
303
+ - model.layers.44.mlp.up_proj
304
+ - model.layers.53.mlp.up_proj
305
+ - model.layers.11.mlp.up_proj
306
+ - model.layers.42.mlp.up_proj
307
+ - model.layers.30.mlp.up_proj
308
+ - model.layers.43.mlp.up_proj
309
+ - model.layers.19.mlp.up_proj
310
+ - model.layers.54.mlp.up_proj
311
+ - model.layers.40.mlp.up_proj
312
+ - model.embed_tokens
313
+ # model.norm layers
314
+ # post_attention_layernorm layers
315
+ # - model.layers.0.post_attention_layernorm
316
+ # - model.layers.1.post_attention_layernorm
317
+ # - model.layers.2.post_attention_layernorm
318
+ # - model.layers.3.post_attention_layernorm
319
+ # - model.layers.4.post_attention_layernorm
320
+ # - model.layers.5.post_attention_layernorm
321
+ # - model.layers.6.post_attention_layernorm
322
+ # - model.layers.7.post_attention_layernorm
323
+ # - model.layers.8.post_attention_layernorm
324
+ # - model.layers.9.post_attention_layernorm
325
+ # - model.layers.10.post_attention_layernorm
326
+ # - model.layers.11.post_attention_layernorm
327
+ # - model.layers.12.post_attention_layernorm
328
+ # - model.layers.13.post_attention_layernorm
329
+ # - model.layers.14.post_attention_layernorm
330
+ # - model.layers.15.post_attention_layernorm
331
+ # - model.layers.16.post_attention_layernorm
332
+ # - model.layers.17.post_attention_layernorm
333
+ # - model.layers.18.post_attention_layernorm
334
+ # - model.layers.19.post_attention_layernorm
335
+ # - model.layers.20.post_attention_layernorm
336
+ # - model.layers.21.post_attention_layernorm
337
+ # - model.layers.22.post_attention_layernorm
338
+ # - model.layers.23.post_attention_layernorm
339
+ # - model.layers.24.post_attention_layernorm
340
+ # - model.layers.25.post_attention_layernorm
341
+ # - model.layers.26.post_attention_layernorm
342
+ # - model.layers.27.post_attention_layernorm
343
+ # - model.layers.28.post_attention_layernorm
344
+ # - model.layers.29.post_attention_layernorm
345
+ # self_attn.k_proj layers
346
+ - model.layers.55.self_attn.k_proj
347
+ - model.layers.51.self_attn.k_proj
348
+ - model.layers.53.self_attn.k_proj
349
+ - model.layers.56.self_attn.k_proj
350
+ - model.layers.54.self_attn.k_proj
351
+ - model.layers.57.self_attn.k_proj
352
+ - model.layers.52.self_attn.k_proj
353
+ - model.layers.59.self_attn.k_proj
354
+ - model.layers.49.self_attn.k_proj
355
+ - model.layers.48.self_attn.k_proj
356
+ - model.layers.47.self_attn.k_proj
357
+ - model.layers.41.self_attn.k_proj
358
+ - model.layers.58.self_attn.k_proj
359
+ - model.layers.40.self_attn.k_proj
360
+ - model.layers.46.self_attn.k_proj
361
+ - model.layers.44.self_attn.k_proj
362
+ - model.layers.50.self_attn.k_proj
363
+ - model.layers.43.self_attn.k_proj
364
+ - model.layers.39.self_attn.k_proj
365
+ - model.layers.42.self_attn.k_proj
366
+ - model.layers.45.self_attn.k_proj
367
+ - model.layers.33.self_attn.k_proj
368
+ - model.layers.37.self_attn.k_proj
369
+ - model.layers.17.self_attn.k_proj
370
+ - model.layers.24.self_attn.k_proj
371
+ - model.layers.21.self_attn.k_proj
372
+ - model.layers.25.self_attn.k_proj
373
+ - model.layers.23.self_attn.k_proj
374
+ - model.layers.35.self_attn.k_proj
375
+ - model.layers.20.self_attn.k_proj
376
+ # self_attn.o_proj layers
377
+ - model.layers.53.self_attn.o_proj
378
+ - model.layers.55.self_attn.o_proj
379
+ - model.layers.54.self_attn.o_proj
380
+ - model.layers.42.self_attn.o_proj
381
+ - model.layers.52.self_attn.o_proj
382
+ - model.layers.51.self_attn.o_proj
383
+ - model.layers.50.self_attn.o_proj
384
+ - model.layers.1.self_attn.o_proj
385
+ - model.layers.40.self_attn.o_proj
386
+ - model.layers.37.self_attn.o_proj
387
+ - model.layers.34.self_attn.o_proj
388
+ - model.layers.36.self_attn.o_proj
389
+ - model.layers.41.self_attn.o_proj
390
+ - model.layers.35.self_attn.o_proj
391
+ - model.layers.46.self_attn.o_proj
392
+ - model.layers.27.self_attn.o_proj
393
+ - model.layers.33.self_attn.o_proj
394
+ - model.layers.30.self_attn.o_proj
395
+ - model.layers.43.self_attn.o_proj
396
+ - model.layers.39.self_attn.o_proj
397
+ - model.layers.17.self_attn.o_proj
398
+ - model.layers.28.self_attn.o_proj
399
+ - model.layers.48.self_attn.o_proj
400
+ - model.layers.31.self_attn.o_proj
401
+ - model.layers.29.self_attn.o_proj
402
+ - model.layers.38.self_attn.o_proj
403
+ - model.layers.47.self_attn.o_proj
404
+ - model.layers.56.self_attn.o_proj
405
+ - model.layers.32.self_attn.o_proj
406
+ - model.layers.4.self_attn.o_proj
407
+ # self_attn.q_proj layers
408
+ - model.layers.1.self_attn.q_proj
409
+ - model.layers.3.self_attn.q_proj
410
+ - model.layers.4.self_attn.q_proj
411
+ - model.layers.5.self_attn.q_proj
412
+ - model.layers.2.self_attn.q_proj
413
+ - model.layers.0.self_attn.q_proj
414
+ - model.layers.6.self_attn.q_proj
415
+ - model.layers.8.self_attn.q_proj
416
+ - model.layers.7.self_attn.q_proj
417
+ - model.layers.10.self_attn.q_proj
418
+ - model.layers.36.self_attn.q_proj
419
+ - model.layers.11.self_attn.q_proj
420
+ - model.layers.9.self_attn.q_proj
421
+ - model.layers.35.self_attn.q_proj
422
+ - model.layers.28.self_attn.q_proj
423
+ - model.layers.34.self_attn.q_proj
424
+ - model.layers.27.self_attn.q_proj
425
+ - model.layers.14.self_attn.q_proj
426
+ - model.layers.29.self_attn.q_proj
427
+ - model.layers.12.self_attn.q_proj
428
+ - model.layers.33.self_attn.q_proj
429
+ - model.layers.30.self_attn.q_proj
430
+ - model.layers.24.self_attn.q_proj
431
+ - model.layers.32.self_attn.q_proj
432
+ - model.layers.37.self_attn.q_proj
433
+ - model.layers.20.self_attn.q_proj
434
+ - model.layers.15.self_attn.q_proj
435
+ - model.layers.16.self_attn.q_proj
436
+ - model.layers.26.self_attn.q_proj
437
+ - model.layers.31.self_attn.q_proj
438
+ # self_attn.v_proj layers
439
+ - model.layers.7.self_attn.v_proj
440
+ - model.layers.8.self_attn.v_proj
441
+ - model.layers.9.self_attn.v_proj
442
+ - model.layers.10.self_attn.v_proj
443
+ - model.layers.12.self_attn.v_proj
444
+ - model.layers.13.self_attn.v_proj
445
+ - model.layers.14.self_attn.v_proj
446
+ - model.layers.15.self_attn.v_proj
447
+ - model.layers.16.self_attn.v_proj
448
+ - model.layers.17.self_attn.v_proj
449
+ - model.layers.21.self_attn.v_proj
450
+ - model.layers.23.self_attn.v_proj
451
+ - model.layers.39.self_attn.v_proj
452
+ - model.layers.46.self_attn.v_proj
453
+ - model.layers.48.self_attn.v_proj
454
+ - model.layers.49.self_attn.v_proj
455
+ - model.layers.51.self_attn.v_proj
456
+ - model.layers.52.self_attn.v_proj
457
+ - model.layers.53.self_attn.v_proj
458
+ - model.layers.54.self_attn.v_proj
459
+ - model.layers.55.self_attn.v_proj
460
+ - model.layers.56.self_attn.v_proj
461
+ - model.layers.22.self_attn.v_proj
462
+ - model.layers.18.self_attn.v_proj
463
+ - model.layers.50.self_attn.v_proj
464
+ - model.layers.47.self_attn.v_proj
465
+ - model.layers.44.self_attn.v_proj
466
+ - model.layers.45.self_attn.v_proj
467
+ - model.layers.57.self_attn.v_proj
468
+ - model.layers.41.self_attn.v_proj
469
+
470
+
471
+ ```
472
+
473
+ </details><br>
474
+
475
+ # out-yi
476
+
477
+ This model is a fine-tuned version of [01-ai/Yi-1.5-34B](https://huggingface.co/01-ai/Yi-1.5-34B-32k) on the None dataset.
478
+ It achieves the following results on the evaluation set:
479
+ - Loss: 0.4425
480
+
481
+ ## Model description
482
+
483
+ More information needed
484
+
485
+ ## Intended uses & limitations
486
+
487
+ More information needed
488
+
489
+ ## Training and evaluation data
490
+
491
+ More information needed
492
+
493
+ ## Training procedure
494
+
495
+ ### Training hyperparameters
496
+
497
+ The following hyperparameters were used during training:
498
+ - learning_rate: 1e-05
499
+ - train_batch_size: 1
500
+ - eval_batch_size: 1
501
+ - seed: 42
502
+ - distributed_type: multi-GPU
503
+ - num_devices: 8
504
+ - gradient_accumulation_steps: 8
505
+ - total_train_batch_size: 64
506
+ - total_eval_batch_size: 8
507
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
508
+ - lr_scheduler_type: cosine
509
+ - lr_scheduler_warmup_steps: 10
510
+ - num_epochs: 3
511
+
512
+ ### Training results
513
+
514
+ | Training Loss | Epoch | Step | Validation Loss |
515
+ |:-------------:|:-----:|:----:|:---------------:|
516
+ | 0.6265 | 0.0 | 1 | 0.6035 |
517
+ | 0.4674 | 0.25 | 327 | 0.4344 |
518
+ | 0.4337 | 0.5 | 654 | 0.4250 |
519
+ | 0.4346 | 0.75 | 981 | 0.4179 |
520
+ | 0.3985 | 1.0 | 1308 | 0.4118 |
521
+ | 0.3128 | 1.23 | 1635 | 0.4201 |
522
+ | 0.3261 | 1.48 | 1962 | 0.4157 |
523
+ | 0.3259 | 1.73 | 2289 | 0.4122 |
524
+ | 0.3126 | 1.98 | 2616 | 0.4079 |
525
+ | 0.2265 | 2.21 | 2943 | 0.4441 |
526
+ | 0.2297 | 2.46 | 3270 | 0.4427 |
527
+ | 0.2424 | 2.71 | 3597 | 0.4425 |
528
+
529
+
530
+ ### Framework versions
531
+
532
+ - Transformers 4.40.0.dev0
533
+ - Pytorch 2.2.2+cu121
534
+ - Datasets 2.15.0
535
+ - Tokenizers 0.15.0