OpenMOSE
/

HRWKV7-hxa079-Qwen3-8B

Model card Files Files and versions

OpenMOSE commited on Jul 23

Commit

64ccbe5

·

verified ·

1 Parent(s): fb58b51

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ The model implements several key improvements over original RWKV architectures:
 ### Hybrid Design Benefits
-- **Linear Attention Inference**: RWKV blocks enable O(1) memory complexity during inference, and the hybrid approach reduces the KVCache to 1/7 of full GQA.
 - **Enhanced Needle Tasks**: Strategic placement of GQA layers significantly improves performance on needle-in-haystack retrieval tasks, addressing a known limitation of pure linear attention models
 - **Implicit Position Encoding**: Interestingly, the model achieves better performance when RoPE (Rotary Position Embedding) is not applied to GQA layers, suggesting that RWKV blocks provide implicit positional encoding capabilities

 ### Hybrid Design Benefits
+- **Linear Attention Inference**: RWKV blocks enable O(1) memory complexity during inference, and the hybrid approach reduces the KVCache to 1/9 of full GQA.
 - **Enhanced Needle Tasks**: Strategic placement of GQA layers significantly improves performance on needle-in-haystack retrieval tasks, addressing a known limitation of pure linear attention models
 - **Implicit Position Encoding**: Interestingly, the model achieves better performance when RoPE (Rotary Position Embedding) is not applied to GQA layers, suggesting that RWKV blocks provide implicit positional encoding capabilities