huihui-ai
/

Huihui-MoE-1B-A0.6B

Text Generation

Mixture of Experts

Model card Files Files and versions

huihui-ai commited on Jun 16

Commit

b875760

·

verified ·

1 Parent(s): 8b579d6

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -16,6 +16,7 @@ tags:
 ## Model Overview
 Huihui-MoE-1B-A0.6B is a **Mixture of Experts (MoE)** language model developed by **huihui.ai**, built upon the **[Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)** base model. It enhances the standard Transformer architecture by replacing MLP layers with MoE layers, each containing 3 experts, to achieve high performance with efficient inference. The model is designed for natural language processing tasks, including text generation, question answering, and conversational applications.
  - **Architecture**: Qwen3MoeForCausalLM model with 3 experts per layer (num_experts=3), activating 1 expert per token (num_experts_per_tok=1).
  - **Total Parameters**: ~1.1 billion (1B)

 ## Model Overview
 Huihui-MoE-1B-A0.6B is a **Mixture of Experts (MoE)** language model developed by **huihui.ai**, built upon the **[Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)** base model. It enhances the standard Transformer architecture by replacing MLP layers with MoE layers, each containing 3 experts, to achieve high performance with efficient inference. The model is designed for natural language processing tasks, including text generation, question answering, and conversational applications.
+This version does not support ollama because tie_word_embeddings=True results in the absence of lm_head parameters being saved; therefore, ollama cannot be used. If ollama support is required, please choose the latest version [huihui-ai/Huihui-MoE-1.2B-A0.6B](https://huggingface.co/huihui-ai/Huihui-MoE-1.2B-A0.6B).
  - **Architecture**: Qwen3MoeForCausalLM model with 3 experts per layer (num_experts=3), activating 1 expert per token (num_experts_per_tok=1).
  - **Total Parameters**: ~1.1 billion (1B)