Update README.md
Browse files
README.md
CHANGED
@@ -11,14 +11,14 @@ license: llama3.3
|
|
11 |
---
|
12 |
# about
|
13 |
|
14 |
-
The Teaz series is my third attempt at making merges, this time on L3.x 70b, after the L3.2 3b
|
15 |
|
16 |
-
This time, the goal was to make a smart model with a low perplexity, in accordance to the principles of the Kermes series, but with a merge of 3 merged models like on the
|
17 |
|
18 |
Huihui's abliterated models were used:
|
19 |
- Llama 3.3 70b as the pivot of the first/main model.
|
20 |
-
- Nemotron 3.1 70b and Deepseek R1 Distill 70b as the pillars.
|
21 |
-
- and Tulu 3 70b as
|
22 |
|
23 |
Bingo again. I hit 3.45 ppl512 wikieng, 62+ or ARC-C, and 82+ on ARC-E. Absolute top of the class for L3.x 70b, like Kermes is for L3 3.2 3b.
|
24 |
|
@@ -26,6 +26,15 @@ No cheating, no contaminating, just the wonderful MergeKit model-stock merge tec
|
|
26 |
|
27 |
Next projects will involve that model as the "smarts pillar" of further merges, aimed at any use case.
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
---
|
30 |
# further developpements
|
31 |
|
|
|
11 |
---
|
12 |
# about
|
13 |
|
14 |
+
The Teaz series is my third attempt at making merges, this time on L3.x 70b, after the L3.2 3b Costume and Kermes series.
|
15 |
|
16 |
+
This time, the goal was to make a smart model with a low perplexity, in accordance to the principles of the Kermes series, but with a merge of 3 merged models like on the Costume series.
|
17 |
|
18 |
Huihui's abliterated models were used:
|
19 |
- Llama 3.3 70b as the pivot of the first/main model.
|
20 |
+
- Nemotron 3.1 70b and Deepseek R1 Distill 70b as the pillars of the main model, and the interlaced pivots/pillar of the 2nd and 3rd models.
|
21 |
+
- and Tulu 3 70b as a second pillar of the 2nd and 3rd models.
|
22 |
|
23 |
Bingo again. I hit 3.45 ppl512 wikieng, 62+ or ARC-C, and 82+ on ARC-E. Absolute top of the class for L3.x 70b, like Kermes is for L3 3.2 3b.
|
24 |
|
|
|
26 |
|
27 |
Next projects will involve that model as the "smarts pillar" of further merges, aimed at any use case.
|
28 |
|
29 |
+
Edit : the mothodology I use is actually partly rediscovered hot water.
|
30 |
+
|
31 |
+
- Mixing (finetuned) base and (finetuned) instructs,
|
32 |
+
- and using 3 models (a base, 2 sidekicks),
|
33 |
+
|
34 |
+
have been described as optimal for Merge-Stock by some enthusiasts already.
|
35 |
+
|
36 |
+
The new thing is to leverage this into a tree of merges with interlaced combinations. That's the natural developpement of the 2 aforementioned "rules".
|
37 |
+
|
38 |
---
|
39 |
# further developpements
|
40 |
|