Update README.md
Browse files
README.md
CHANGED
@@ -36,8 +36,8 @@ HRWKV7-Reka-Flash3-Preview is an experimental hybrid architecture model that com
|
|
36 |
The model implements several key improvements over standard RWKV architectures:
|
37 |
|
38 |
1. **Token Shift Removal**: Unlike traditional RWKV, the hxa079 variant removes token shifting mechanisms
|
39 |
-
2. **GroupNorm Removal**: Eliminates GroupNorm
|
40 |
-
3. **k_first Introduction**:
|
41 |
|
42 |
### Hybrid Design Benefits
|
43 |
|
|
|
36 |
The model implements several key improvements over standard RWKV architectures:
|
37 |
|
38 |
1. **Token Shift Removal**: Unlike traditional RWKV, the hxa079 variant removes token shifting mechanisms
|
39 |
+
2. **GroupNorm Removal**: Eliminates GroupNorm for training stability
|
40 |
+
3. **k_first Introduction**: Experimentally adopted the approach of residually connecting k layers in layer 0.
|
41 |
|
42 |
### Hybrid Design Benefits
|
43 |
|