NeuralNovel
/

Mini-Mixtral-v0.2

Text Generation

Mixture of Experts

unsloth/mistral-7b-v0.2

mistralai/Mistral-7B-Instruct-v0.2

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

NeuralNovel commited on Mar 24, 2024

Commit

326146d

·

verified ·

1 Parent(s): 904465d

Update README.md

Files changed (1) hide show

README.md +3 -5

README.md CHANGED Viewed

@@ -22,6 +22,9 @@ Mini-Mixtral-v0.2 is a Mixture of Experts (MoE) made with the following models u
 * [unsloth/mistral-7b-v0.2](https://huggingface.co/unsloth/mistral-7b-v0.2)
 * [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
 ## 🧩 Configuration
 ```yaml
@@ -77,10 +80,6 @@ print(outputs[0]["generated_text"])
 ## "[What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)"
-<a href='https://ko-fi.com/S6S2UH2TC' target='_blank'><img height='38' style='border:0px;height:36px;' src='https://storage.ko-fi.com/cdn/kofi1.png?v=3' border='0' alt='Buy Me a Coffee at ko-fi.com' /></a>
-<a href='https://discord.gg/KFS229xD' target='_blank'><img width='140' height='500' style='border:0px;height:36px;' src='https://i.ibb.co/tqwznYM/Discord-button.png' border='0' alt='Join Our Discord!' /></a>
 ### (from the MistralAI papers...click the quoted question above to navigate to it directly.)
 The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.
@@ -113,4 +112,3 @@ If all our tokens are sent to just a few popular experts, that will make trainin
 ![image/gif](https://cdn-uploads.huggingface.co/production/uploads/6589d7e6586088fd2784a12c/43v7GezlOGg2BYljbU5ge.gif)
 ## "Wait...but you called this a frankenMoE?"
 The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously.
-```

 * [unsloth/mistral-7b-v0.2](https://huggingface.co/unsloth/mistral-7b-v0.2)
 * [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
+<a href='https://ko-fi.com/S6S2UH2TC' target='_blank'><img height='38' style='border:0px;height:36px;' src='https://storage.ko-fi.com/cdn/kofi1.png?v=3' border='0' alt='Buy Me a Coffee at ko-fi.com' /></a>
+<a href='https://discord.gg/KFS229xD' target='_blank'><img width='140' height='500' style='border:0px;height:36px;' src='https://i.ibb.co/tqwznYM/Discord-button.png' border='0' alt='Join Our Discord!' /></a>
 ## 🧩 Configuration
 ```yaml
 ## "[What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)"
 ### (from the MistralAI papers...click the quoted question above to navigate to it directly.)
 The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.
 ![image/gif](https://cdn-uploads.huggingface.co/production/uploads/6589d7e6586088fd2784a12c/43v7GezlOGg2BYljbU5ge.gif)
 ## "Wait...but you called this a frankenMoE?"
 The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously.