Blackroot commited on
Commit
570df46
·
verified ·
1 Parent(s): 49299d3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -34,7 +34,7 @@ Stock for the "True Merge" -- This was a TIES Merge, the reasoning is explained
34
 
35
 
36
  # Why a different approach?
37
- As some users had noted, particularly thanks to |GodZiol| and The-Istar, the previous Mirai's instruct format was very unclear. Infact, when testing Llama-3 instruct format it seemed just broken, and, it was. Why? Well, the issue was with merging multiple models with different stopping tokens. I'll leave a tecnical explanation below for my assumption about why this happened. The long story short, I changed strategies for this model. It's very different, and expects the Llama-3 format to be used.
38
 
39
  # Possible cause of the issue (Technical)
40
  Llama-3 instruct alone has two distinct EOS tokens, and models like Hermes have it's own EOS. What appears to have happened is that ALL of the EOS tokens basically lost weighting because the weights spread around different EOS tokens. And, this meant that the model did not know which EOS token to produce. Merge enough different models like this, and you end up with no EOS token ever being generated. There's other issues at play, hermes has a different number of tokens, so the hermes EOS does not actually make it into the merge at all, meaning models like Hermes effectively erase the EOS when merging against smaller heads. The puzzling part of this is why Llama-3 format was apparently so disproportionately affected by the merge. I don't have a clear answer for this at all.
 
34
 
35
 
36
  # Why a different approach?
37
+ As some users had noted, particularly thanks to |GodZio| and The-Istar, the previous Mirai's instruct format was very unclear. Infact, when testing Llama-3 instruct format it seemed just broken, and, it was. Why? Well, the issue was with merging multiple models with different stopping tokens. I'll leave a tecnical explanation below for my assumption about why this happened. The long story short, I changed strategies for this model. It's very different, and expects the Llama-3 format to be used.
38
 
39
  # Possible cause of the issue (Technical)
40
  Llama-3 instruct alone has two distinct EOS tokens, and models like Hermes have it's own EOS. What appears to have happened is that ALL of the EOS tokens basically lost weighting because the weights spread around different EOS tokens. And, this meant that the model did not know which EOS token to produce. Merge enough different models like this, and you end up with no EOS token ever being generated. There's other issues at play, hermes has a different number of tokens, so the hermes EOS does not actually make it into the merge at all, meaning models like Hermes effectively erase the EOS when merging against smaller heads. The puzzling part of this is why Llama-3 format was apparently so disproportionately affected by the merge. I don't have a clear answer for this at all.