Nexesenex commited on
Commit
f07f18f
·
verified ·
1 Parent(s): bdb719d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -4
README.md CHANGED
@@ -11,14 +11,14 @@ license: llama3.3
11
  ---
12
  # about
13
 
14
- The Teaz series is my third attempt at making merges, this time on L3.x 70b, after the L3.2 3b Kostume and Kermes series.
15
 
16
- This time, the goal was to make a smart model with a low perplexity, in accordance to the principles of the Kermes series, but with a merge of 3 merged models like on the kostume series.
17
 
18
  Huihui's abliterated models were used:
19
  - Llama 3.3 70b as the pivot of the first/main model.
20
- - Nemotron 3.1 70b and Deepseek R1 Distill 70b as the pillars.
21
- - and Tulu 3 70b as the backer of the 2nd and 3rd models.
22
 
23
  Bingo again. I hit 3.45 ppl512 wikieng, 62+ or ARC-C, and 82+ on ARC-E. Absolute top of the class for L3.x 70b, like Kermes is for L3 3.2 3b.
24
 
@@ -26,6 +26,15 @@ No cheating, no contaminating, just the wonderful MergeKit model-stock merge tec
26
 
27
  Next projects will involve that model as the "smarts pillar" of further merges, aimed at any use case.
28
 
 
 
 
 
 
 
 
 
 
29
  ---
30
  # further developpements
31
 
 
11
  ---
12
  # about
13
 
14
+ The Teaz series is my third attempt at making merges, this time on L3.x 70b, after the L3.2 3b Costume and Kermes series.
15
 
16
+ This time, the goal was to make a smart model with a low perplexity, in accordance to the principles of the Kermes series, but with a merge of 3 merged models like on the Costume series.
17
 
18
  Huihui's abliterated models were used:
19
  - Llama 3.3 70b as the pivot of the first/main model.
20
+ - Nemotron 3.1 70b and Deepseek R1 Distill 70b as the pillars of the main model, and the interlaced pivots/pillar of the 2nd and 3rd models.
21
+ - and Tulu 3 70b as a second pillar of the 2nd and 3rd models.
22
 
23
  Bingo again. I hit 3.45 ppl512 wikieng, 62+ or ARC-C, and 82+ on ARC-E. Absolute top of the class for L3.x 70b, like Kermes is for L3 3.2 3b.
24
 
 
26
 
27
  Next projects will involve that model as the "smarts pillar" of further merges, aimed at any use case.
28
 
29
+ Edit : the mothodology I use is actually partly rediscovered hot water.
30
+
31
+ - Mixing (finetuned) base and (finetuned) instructs,
32
+ - and using 3 models (a base, 2 sidekicks),
33
+
34
+ have been described as optimal for Merge-Stock by some enthusiasts already.
35
+
36
+ The new thing is to leverage this into a tree of merges with interlaced combinations. That's the natural developpement of the 2 aforementioned "rules".
37
+
38
  ---
39
  # further developpements
40