OpenMOSE commited on
Commit
0ba6ed3
·
verified ·
1 Parent(s): 9730e0e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -45,7 +45,7 @@ The model implements several key improvements over standard RWKV architectures:
45
 
46
  ### Hybrid Design Benefits
47
 
48
- - **Linear Attention Inference**: RWKV blocks enable O(1) memory complexity during inference, and hybrids reduce memory usage by 1/7 KVCache.
49
  - **Enhanced Needle Tasks**: Strategic placement of GQA layers significantly improves performance on needle-in-haystack retrieval tasks, addressing a known limitation of pure linear attention models
50
  - **Implicit Position Encoding**: Interestingly, the model achieves better performance when RoPE (Rotary Position Embedding) is not applied to GQA layers, suggesting that RWKV blocks provide implicit positional encoding capabilities
51
 
 
45
 
46
  ### Hybrid Design Benefits
47
 
48
+ - **Linear Attention Inference**: RWKV blocks enable O(1) memory complexity during inference, and the hybrid approach reduces the KVCache to 1/7 of full GQA.
49
  - **Enhanced Needle Tasks**: Strategic placement of GQA layers significantly improves performance on needle-in-haystack retrieval tasks, addressing a known limitation of pure linear attention models
50
  - **Implicit Position Encoding**: Interestingly, the model achieves better performance when RoPE (Rotary Position Embedding) is not applied to GQA layers, suggesting that RWKV blocks provide implicit positional encoding capabilities
51