OpenMOSE commited on
Commit
3bb3693
·
verified ·
1 Parent(s): bfe5add

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -35,8 +35,8 @@ HRWKV7-Reka-Flash3-Preview is an experimental hybrid architecture model that com
35
 
36
  The model implements several key improvements over standard RWKV architectures:
37
 
38
- 1. **Token Shift Removal**: Unlike traditional RWKV, the hxa079 variant removes token shifting mechanisms
39
- 2. **GroupNorm Removal**: Eliminates GroupNorm for training stability
40
  3. **k_first Introduction**: Experimentally adopted the approach of residually connecting k layers in layer 0.
41
 
42
  ### Hybrid Design Benefits
 
35
 
36
  The model implements several key improvements over standard RWKV architectures:
37
 
38
+ 1. **Token Shift Removal**: In order to effectively inherit the teacher model weights, we removed the residual connection one token ago.
39
+ 2. **GroupNorm Removal**: Helps improve training stability issues
40
  3. **k_first Introduction**: Experimentally adopted the approach of residually connecting k layers in layer 0.
41
 
42
  ### Hybrid Design Benefits