Nexesenex
/

Llama_3.x_70b_Smarteaz_V1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Nexesenex commited on 24 days ago

Commit

f07f18f

·

verified ·

1 Parent(s): bdb719d

Update README.md

Files changed (1) hide show

README.md +13 -4

README.md CHANGED Viewed

@@ -11,14 +11,14 @@ license: llama3.3
 ---
 # about
-The Teaz series is my third attempt at making merges, this time on L3.x 70b, after the L3.2 3b Kostume and Kermes series.
-This time, the goal was to make a smart model with a low perplexity, in accordance to the principles of the Kermes series, but with a merge of 3 merged models like on the kostume series.
 Huihui's abliterated models were used:
 - Llama 3.3 70b as the pivot of the first/main model.
-- Nemotron 3.1 70b and Deepseek R1 Distill 70b as the pillars.
-- and Tulu 3 70b as the backer of the 2nd and 3rd models.
 Bingo again. I hit 3.45 ppl512 wikieng, 62+ or ARC-C, and 82+ on ARC-E. Absolute top of the class for L3.x 70b, like Kermes is for L3 3.2 3b.
@@ -26,6 +26,15 @@ No cheating, no contaminating, just the wonderful MergeKit model-stock merge tec
 Next projects will involve that model as the "smarts pillar" of further merges, aimed at any use case.
 ---
 # further developpements

 ---
 # about
+The Teaz series is my third attempt at making merges, this time on L3.x 70b, after the L3.2 3b Costume and Kermes series.
+This time, the goal was to make a smart model with a low perplexity, in accordance to the principles of the Kermes series, but with a merge of 3 merged models like on the Costume series.
 Huihui's abliterated models were used:
 - Llama 3.3 70b as the pivot of the first/main model.
+- Nemotron 3.1 70b and Deepseek R1 Distill 70b as the pillars of the main model, and the interlaced pivots/pillar of the 2nd and 3rd models.
+- and Tulu 3 70b as a second pillar of the 2nd and 3rd models.
 Bingo again. I hit 3.45 ppl512 wikieng, 62+ or ARC-C, and 82+ on ARC-E. Absolute top of the class for L3.x 70b, like Kermes is for L3 3.2 3b.
 Next projects will involve that model as the "smarts pillar" of further merges, aimed at any use case.
+Edit : the mothodology I use is actually partly rediscovered hot water.
+- Mixing (finetuned) base and (finetuned) instructs,
+- and using 3 models (a base, 2 sidekicks),
+have been described as optimal for Merge-Stock by some enthusiasts already.
+The new thing is to leverage this into a tree of merges with interlaced combinations. That's the natural developpement of the 2 aforementioned "rules".
 ---
 # further developpements