BasedBase commited on
Commit
cdf566e
·
verified ·
1 Parent(s): 9792742

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -22,6 +22,8 @@ It is the result of applying a LoRA created via an SVD-based distillation pipeli
22
 
23
  The primary goal was to explore the high-fidelity transfer of complex reasoning patterns, particularly those encoded within the Mixture-of-Experts (MoE) layers, from a frontier-class model to a consumer-accessible one.
24
 
 
 
25
  ## The Distillation Methodology
26
 
27
  This model was not trained in a conventional sense. Instead, it was created using a layer-by-layer distillation SVD based distillation process.
 
22
 
23
  The primary goal was to explore the high-fidelity transfer of complex reasoning patterns, particularly those encoded within the Mixture-of-Experts (MoE) layers, from a frontier-class model to a consumer-accessible one.
24
 
25
+ You should notice that the model has a more confident and linear chain-of-thought compared to the base qwen3-30b-a3b-thinking-2507 model like Deepseek 3.1 has. This distill tends to overthink much less than the base model and provides more accurate better structured answers.
26
+
27
  ## The Distillation Methodology
28
 
29
  This model was not trained in a conventional sense. Instead, it was created using a layer-by-layer distillation SVD based distillation process.