QuixiAI
/

Qwen3-72B-Embiggened

Model card Files Files and versions

ehartford commited on Jun 13

Commit

a3d8e2d

·

verified ·

1 Parent(s): a990883

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -11,6 +11,8 @@ base_model:
 Qwen3-72B-Embiggened is an experimental expansion of Qwen3-32B to match the full Qwen3-72B architecture. Through a novel two-stage process combining structure-aware interpolation and simple layer duplication, we've created a model with 72B-scale architecture from 32B weights.
 The next step of this process is to distill Qwen3-235B into this model.  The resulting model will be called Qwen3-72B-Distilled
 This model was made possible by excellent AMD mi300x compute generously provided by [Hot Aisle](https://hotaisle.xyz/).

 Qwen3-72B-Embiggened is an experimental expansion of Qwen3-32B to match the full Qwen3-72B architecture. Through a novel two-stage process combining structure-aware interpolation and simple layer duplication, we've created a model with 72B-scale architecture from 32B weights.
+the code to generate this model is here: [stage2_v3.py](https://huggingface.co/cognitivecomputations/Qwen3-72B-Embiggened/blob/main/stage2_v3.py)
 The next step of this process is to distill Qwen3-235B into this model.  The resulting model will be called Qwen3-72B-Distilled
 This model was made possible by excellent AMD mi300x compute generously provided by [Hot Aisle](https://hotaisle.xyz/).