Blackroot
/

Mirai-3.0-70B

@@ -40,7 +40,7 @@ As some users had noted, particularly thanks to |GodZio| and The-Istar, the prev
 Llama-3 instruct alone has two distinct EOS tokens, and models like Hermes have it's own EOS. What appears to have happened is that ALL of the EOS tokens basically lost weighting because the weights spread around different EOS tokens. And, this meant that the model did not know which EOS token to produce. Merge enough different models like this, and you end up with no EOS token ever being generated. There's other issues at play, hermes has a different number of tokens, so the hermes EOS does not actually make it into the merge at all, meaning models like Hermes effectively erase the EOS when merging against smaller heads. The puzzling part of this is why Llama-3 format was apparently so disproportionately affected by the merge. I don't have a clear answer for this at all.
 # New strategy (Still Evolutionary)
-Alright. This is a bit of a rollercoaster, I spent lots of time with modelstock trying to ensure I could preserve the EOS token, and it essentially boiled down to only leaving a handful of models with a very sub-par result. I really was not a fan. So much so, I changed directions.
 # Base Folding
 We have competing objectives for the model. We want model diversity for interesting storytelling. That's really great. However, increasing model diversity also increases EOS diversity. This is a problem. We also want the model to be able to shut up when it wants to. There's a problem, modelstock is using geometric interpolation. In the case of lots of EOS tokens, it's going to act a bit like averaging them. This is a huge problem. We can't disagree about which EOS to use. Hence, we're going to fold the bases together, which all have the same EOS tokens. Specifically, we're going to do a modelstock merge like this:

 Llama-3 instruct alone has two distinct EOS tokens, and models like Hermes have it's own EOS. What appears to have happened is that ALL of the EOS tokens basically lost weighting because the weights spread around different EOS tokens. And, this meant that the model did not know which EOS token to produce. Merge enough different models like this, and you end up with no EOS token ever being generated. There's other issues at play, hermes has a different number of tokens, so the hermes EOS does not actually make it into the merge at all, meaning models like Hermes effectively erase the EOS when merging against smaller heads. The puzzling part of this is why Llama-3 format was apparently so disproportionately affected by the merge. I don't have a clear answer for this at all.
 # New strategy (Still Evolutionary)
+Alright. We're still doing the evolutionary thing, take the previous generation, and either add or remove a model. However, this version is a bit of a rollercoaster, I spent lots of time with modelstock trying to ensure I could preserve the EOS token, and it essentially boiled down to only leaving a handful of models with a very sub-par result. I really was not a fan. So much so, I changed directions.
 # Base Folding
 We have competing objectives for the model. We want model diversity for interesting storytelling. That's really great. However, increasing model diversity also increases EOS diversity. This is a problem. We also want the model to be able to shut up when it wants to. There's a problem, modelstock is using geometric interpolation. In the case of lots of EOS tokens, it's going to act a bit like averaging them. This is a huge problem. We can't disagree about which EOS to use. Hence, we're going to fold the bases together, which all have the same EOS tokens. Specifically, we're going to do a modelstock merge like this: