Update README.md
Browse files
README.md
CHANGED
@@ -35,8 +35,8 @@ HRWKV7-Reka-Flash3-Preview is an experimental hybrid architecture model that com
|
|
35 |
|
36 |
The model implements several key improvements over standard RWKV architectures:
|
37 |
|
38 |
-
1. **Token Shift Removal**:
|
39 |
-
2. **GroupNorm Removal**:
|
40 |
3. **k_first Introduction**: Experimentally adopted the approach of residually connecting k layers in layer 0.
|
41 |
|
42 |
### Hybrid Design Benefits
|
|
|
35 |
|
36 |
The model implements several key improvements over standard RWKV architectures:
|
37 |
|
38 |
+
1. **Token Shift Removal**: In order to effectively inherit the teacher model weights, we removed the residual connection one token ago.
|
39 |
+
2. **GroupNorm Removal**: Helps improve training stability issues
|
40 |
3. **k_first Introduction**: Experimentally adopted the approach of residually connecting k layers in layer 0.
|
41 |
|
42 |
### Hybrid Design Benefits
|