kajuma commited on
Commit
0712123
·
verified ·
1 Parent(s): 0053e07

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -19
README.md CHANGED
@@ -7,43 +7,78 @@ tags:
7
  - merge
8
 
9
  ---
10
- # merge_model_5
11
 
12
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
13
 
14
- ## Merge Details
15
- ### Merge Method
16
-
17
- This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [Qwen/QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) as a base.
18
-
19
- ### Models Merged
20
-
21
- The following models were included in the merge:
22
- * ./merge_model_3
23
- * ./merge_model_4
24
- * ./merge_model_2
25
- * ./merge_model_1
26
-
27
  ### Configuration
28
 
29
  The following YAML configuration was used to produce this model:
30
 
31
  ```yaml
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  merge_method: model_stock
33
 
34
  base_model: Qwen/QwQ-32B
35
 
36
  models:
37
  - model: Qwen/QwQ-32B
38
- - model: ./merge_model_1
39
- - model: ./merge_model_2
40
- - model: ./merge_model_3
41
- - model: ./merge_model_4
42
 
43
  dtype: bfloat16
44
 
45
  pad_to_multiple_of: 512
46
  tokenizer_source: base
47
 
48
- name: SKYDRIVE-32B-v0.1
49
  ```
 
7
  - merge
8
 
9
  ---
10
+ # QwQ-32B Kumo
11
 
12
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ### Configuration
15
 
16
  The following YAML configuration was used to produce this model:
17
 
18
  ```yaml
19
+ merge_method: slerp
20
+ base_model: Qwen/QwQ-32B
21
+ models:
22
+ - model: Qwen/QwQ-32B
23
+ - model: NovaSky-AI/Sky-T1-32B-Flash
24
+ parameters:
25
+ t: 0.4
26
+ dtype: bfloat16
27
+ name: merge_model_1
28
+ ---
29
+ merge_method: breadcrumbs_ties
30
+ base_model: Qwen/QwQ-32B
31
+ tokenizer_source: Qwen/QwQ-32B
32
+ name: merge_model_2
33
+ models:
34
+ - model: Qwen/QwQ-32B
35
+ parameters:
36
+ weight: 1.0
37
+ - model: FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview
38
+ parameters:
39
+ weight: 0.75
40
+ dtype: bfloat16
41
+ ---
42
+ merge_method: task_arithmetic
43
+ base_model: Qwen/Qwen2.5-32B
44
+ tokenizer_source: Qwen/QwQ-32B
45
+ name: merge_model_3
46
+ models:
47
+ - model: rinna/deepseek-r1-distill-qwen2.5-bakeneko-32b
48
+ parameters:
49
+ weight: 1.0
50
+ - model: cyberagent/DeepSeek-R1-Distill-Qwen-32B-Japanese
51
+ parameters:
52
+ weight: 0.9
53
+ tokenizer_source: base
54
+ dtype: bfloat16
55
+ ---
56
+ merge_method: slerp
57
+ base_model: Qwen/QwQ-32B
58
+ models:
59
+ - model: Qwen/QwQ-32B
60
+ - model: TeamDelta/ABEJA-Qwen2.5-32B-base-jp-v0.1
61
+ parameters:
62
+ t: 0.5
63
+ tokenizer_source: base
64
+ dtype: bfloat16
65
+ name: merge_model_4
66
+ ---
67
  merge_method: model_stock
68
 
69
  base_model: Qwen/QwQ-32B
70
 
71
  models:
72
  - model: Qwen/QwQ-32B
73
+ - model: merge_model_1
74
+ - model: merge_model_2
75
+ - model: merge_model_3
76
+ - model: merge_model_4
77
 
78
  dtype: bfloat16
79
 
80
  pad_to_multiple_of: 512
81
  tokenizer_source: base
82
 
83
+ name: QwQ-32B-Kumo
84
  ```