BasedBase
/

Qwen3-30B-A3B-Thinking-2507-Deepseek-v3.1-Distill

Mixture of Experts

mixture-of-experts

code-generation

Model card Files Files and versions

BasedBase commited on 6 days ago

Commit

cdf566e

·

verified ·

1 Parent(s): 9792742

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -22,6 +22,8 @@ It is the result of applying a LoRA created via an SVD-based distillation pipeli
 The primary goal was to explore the high-fidelity transfer of complex reasoning patterns, particularly those encoded within the Mixture-of-Experts (MoE) layers, from a frontier-class model to a consumer-accessible one.
 ## The Distillation Methodology
 This model was not trained in a conventional sense. Instead, it was created using a layer-by-layer distillation SVD based distillation process.

 The primary goal was to explore the high-fidelity transfer of complex reasoning patterns, particularly those encoded within the Mixture-of-Experts (MoE) layers, from a frontier-class model to a consumer-accessible one.
+You should notice that the model has a more confident and linear chain-of-thought compared to the base qwen3-30b-a3b-thinking-2507 model like Deepseek 3.1 has. This distill tends to overthink much less than the base model and provides more accurate better structured answers.
 ## The Distillation Methodology
 This model was not trained in a conventional sense. Instead, it was created using a layer-by-layer distillation SVD based distillation process.