OpenMOSE commited on
Commit
aaa59d0
·
verified ·
1 Parent(s): 89cd398

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -7
README.md CHANGED
@@ -41,13 +41,6 @@ The **sole purpose** of this project was to **test the feasibility of replacing
41
 
42
  ---
43
 
44
- ### 💀 **The Painful Side**
45
- - **Spike Hell**: Ctx4096 training introduced catastrophic **KL Loss spikes**, requiring constant rollbacks and manual interventions.
46
- - **VRAM starvation**: Running 14B models with long contexts meant **batch sizes** were reduced to **32**, relying on **Gradient Accumulation** just to survive.
47
- - **System Prompt Overfitting**: Earlier phases locked the model into repeating fixed prompts, needing a **full distillation reset**.
48
-
49
- ---
50
-
51
  ### 📈 **Scaling Observations**
52
  - PRWKV scales from **3B** to **14B** parameters.
53
  - **14B KD** runs achieved **KL divergence < 0.1**, proving **RNN TimeMix blocks can indeed mimic Transformer Attention** at high fidelity.
 
41
 
42
  ---
43
 
 
 
 
 
 
 
 
44
  ### 📈 **Scaling Observations**
45
  - PRWKV scales from **3B** to **14B** parameters.
46
  - **14B KD** runs achieved **KL divergence < 0.1**, proving **RNN TimeMix blocks can indeed mimic Transformer Attention** at high fidelity.