Safetensors
qwen3
ehartford commited on
Commit
a3d8e2d
·
verified ·
1 Parent(s): a990883

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -11,6 +11,8 @@ base_model:
11
 
12
  Qwen3-72B-Embiggened is an experimental expansion of Qwen3-32B to match the full Qwen3-72B architecture. Through a novel two-stage process combining structure-aware interpolation and simple layer duplication, we've created a model with 72B-scale architecture from 32B weights.
13
 
 
 
14
  The next step of this process is to distill Qwen3-235B into this model. The resulting model will be called Qwen3-72B-Distilled
15
 
16
  This model was made possible by excellent AMD mi300x compute generously provided by [Hot Aisle](https://hotaisle.xyz/).
 
11
 
12
  Qwen3-72B-Embiggened is an experimental expansion of Qwen3-32B to match the full Qwen3-72B architecture. Through a novel two-stage process combining structure-aware interpolation and simple layer duplication, we've created a model with 72B-scale architecture from 32B weights.
13
 
14
+ the code to generate this model is here: [stage2_v3.py](https://huggingface.co/cognitivecomputations/Qwen3-72B-Embiggened/blob/main/stage2_v3.py)
15
+
16
  The next step of this process is to distill Qwen3-235B into this model. The resulting model will be called Qwen3-72B-Distilled
17
 
18
  This model was made possible by excellent AMD mi300x compute generously provided by [Hot Aisle](https://hotaisle.xyz/).