Update README.md
Browse files
README.md
CHANGED
@@ -37,7 +37,7 @@ The model implements several key improvements over original RWKV architectures:
|
|
37 |
|
38 |
### Hybrid Design Benefits
|
39 |
|
40 |
-
- **Linear Attention Inference**: RWKV blocks enable O(1) memory complexity during inference, and the hybrid approach reduces the KVCache to 1/
|
41 |
- **Enhanced Needle Tasks**: Strategic placement of GQA layers significantly improves performance on needle-in-haystack retrieval tasks, addressing a known limitation of pure linear attention models
|
42 |
- **Implicit Position Encoding**: Interestingly, the model achieves better performance when RoPE (Rotary Position Embedding) is not applied to GQA layers, suggesting that RWKV blocks provide implicit positional encoding capabilities
|
43 |
|
|
|
37 |
|
38 |
### Hybrid Design Benefits
|
39 |
|
40 |
+
- **Linear Attention Inference**: RWKV blocks enable O(1) memory complexity during inference, and the hybrid approach reduces the KVCache to 1/9 of full GQA.
|
41 |
- **Enhanced Needle Tasks**: Strategic placement of GQA layers significantly improves performance on needle-in-haystack retrieval tasks, addressing a known limitation of pure linear attention models
|
42 |
- **Implicit Position Encoding**: Interestingly, the model achieves better performance when RoPE (Rotary Position Embedding) is not applied to GQA layers, suggesting that RWKV blocks provide implicit positional encoding capabilities
|
43 |
|