Blackroot
/

Mirai-3.0-70B

Model card Files Files and versions Community

Blackroot commited on Dec 22, 2024

Commit

f81a7c7

·

verified ·

1 Parent(s): 7e0b05a

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -21,8 +21,9 @@ tags:
 New base model, this one actually expects you to use Llama 3 Instruct format. There's lots to talk about here, and this model is TOTALLY different than previous version for a variety of reasons.
 # Why a different approach?
-As some users had noted, particularly thanks to |GodZiol| and The-Istar, the previous Mira's instruct format was very unclear. Infact, when testing Llama-3 instruct format it seemed just broken, and, it was. Why? Well, the issue was with merging multiple models with different stopping tokens. I'll leave a tecnical explanation below for my assumption about why this happened.
 # Possible cause of the issue (Technical)
-Llama-3 instruct alone has two distinct EOS tokens, and models like Hermes have it's own EOS. What appears to have happened is that ALL of the EOS tokens basically lost weighting because the weights spread around different EOS tokens. And, this meant that the model did not know which EOS token to produce. Merge enough different models like this, and you end up with no EOS token ever being generated. There's other issues at play, hermes has a different number of tokens, so the hermes EOS does not actually make it into the merge at all, meaning models like Hermes effectively erase the EOS when merging against smaller heads.

 New base model, this one actually expects you to use Llama 3 Instruct format. There's lots to talk about here, and this model is TOTALLY different than previous version for a variety of reasons.
 # Why a different approach?
+As some users had noted, particularly thanks to |GodZiol| and The-Istar, the previous Mira's instruct format was very unclear. Infact, when testing Llama-3 instruct format it seemed just broken, and, it was. Why? Well, the issue was with merging multiple models with different stopping tokens. I'll leave a tecnical explanation below for my assumption about why this happened.
 # Possible cause of the issue (Technical)
+Llama-3 instruct alone has two distinct EOS tokens, and models like Hermes have it's own EOS. What appears to have happened is that ALL of the EOS tokens basically lost weighting because the weights spread around different EOS tokens. And, this meant that the model did not know which EOS token to produce. Merge enough different models like this, and you end up with no EOS token ever being generated. There's other issues at play, hermes has a different number of tokens, so the hermes EOS does not actually make it into the merge at all, meaning models like Hermes effectively erase the EOS when merging against smaller heads. The puzzling part of this is why Llama-3 format was apparently so disproportionately affected by the merge. I don't have a clear answer for this at all.