Update README.md
Browse files
README.md
CHANGED
|
@@ -11,6 +11,8 @@ base_model:
|
|
| 11 |
|
| 12 |
Qwen3-72B-Embiggened is an experimental expansion of Qwen3-32B to match the full Qwen3-72B architecture. Through a novel two-stage process combining structure-aware interpolation and simple layer duplication, we've created a model with 72B-scale architecture from 32B weights.
|
| 13 |
|
|
|
|
|
|
|
| 14 |
The next step of this process is to distill Qwen3-235B into this model. The resulting model will be called Qwen3-72B-Distilled
|
| 15 |
|
| 16 |
This model was made possible by excellent AMD mi300x compute generously provided by [Hot Aisle](https://hotaisle.xyz/).
|
|
|
|
| 11 |
|
| 12 |
Qwen3-72B-Embiggened is an experimental expansion of Qwen3-32B to match the full Qwen3-72B architecture. Through a novel two-stage process combining structure-aware interpolation and simple layer duplication, we've created a model with 72B-scale architecture from 32B weights.
|
| 13 |
|
| 14 |
+
the code to generate this model is here: [stage2_v3.py](https://huggingface.co/cognitivecomputations/Qwen3-72B-Embiggened/blob/main/stage2_v3.py)
|
| 15 |
+
|
| 16 |
The next step of this process is to distill Qwen3-235B into this model. The resulting model will be called Qwen3-72B-Distilled
|
| 17 |
|
| 18 |
This model was made possible by excellent AMD mi300x compute generously provided by [Hot Aisle](https://hotaisle.xyz/).
|