huihui-ai
/

Huihui-MoE-1B-A0.6B

Text Generation

Mixture of Experts

Model card Files Files and versions

huihui-ai commited on Jun 13

Commit

8b579d6

·

verified ·

1 Parent(s): 6925fda

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -50,7 +50,7 @@ This model was fully fine-tuned with BF16 on first 20k rows of `FreedomIntellige
  - **Base Model**: Qwen3-0.6B, pre-trained by the Qwen team, Experts, pre-trained by the Suayptalha team.
  - **Conversion**: The model copies embeddings, self-attention, and normalization weights from Qwen3-0.6B, replacing MLP layers with MoE layers (3 experts). Gating weights are randomly initialized.
- - **Fine-Tuning**: Not fine-tuned; users are recommended to fine-tune for specific tasks to optimize expert routing.
 ## Usage

  - **Base Model**: Qwen3-0.6B, pre-trained by the Qwen team, Experts, pre-trained by the Suayptalha team.
  - **Conversion**: The model copies embeddings, self-attention, and normalization weights from Qwen3-0.6B, replacing MLP layers with MoE layers (3 experts). Gating weights are randomly initialized.
+ - **Fine-Tuning**: Not fine-tuned; users are recommended to fine-tune for specific tasks to optimize expert routing. The fine-tuned version is already available and can be referred to as [huihui-ai/Huihui-MoE-1B-A0.6B-SFT](https://huggingface.co/huihui-ai/Huihui-MoE-1B-A0.6B-SFT).
 ## Usage