Update README.md
Browse files
README.md
CHANGED
@@ -45,7 +45,7 @@ The model implements several key improvements over standard RWKV architectures:
|
|
45 |
|
46 |
### Hybrid Design Benefits
|
47 |
|
48 |
-
- **Linear Attention Inference**: RWKV blocks enable O(1) memory complexity during inference, and
|
49 |
- **Enhanced Needle Tasks**: Strategic placement of GQA layers significantly improves performance on needle-in-haystack retrieval tasks, addressing a known limitation of pure linear attention models
|
50 |
- **Implicit Position Encoding**: Interestingly, the model achieves better performance when RoPE (Rotary Position Embedding) is not applied to GQA layers, suggesting that RWKV blocks provide implicit positional encoding capabilities
|
51 |
|
|
|
45 |
|
46 |
### Hybrid Design Benefits
|
47 |
|
48 |
+
- **Linear Attention Inference**: RWKV blocks enable O(1) memory complexity during inference, and the hybrid approach reduces the KVCache to 1/7 of full GQA.
|
49 |
- **Enhanced Needle Tasks**: Strategic placement of GQA layers significantly improves performance on needle-in-haystack retrieval tasks, addressing a known limitation of pure linear attention models
|
50 |
- **Implicit Position Encoding**: Interestingly, the model achieves better performance when RoPE (Rotary Position Embedding) is not applied to GQA layers, suggesting that RWKV blocks provide implicit positional encoding capabilities
|
51 |
|