OpenMOSE commited on
Commit
f69d09a
·
verified ·
1 Parent(s): ef5fe5d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -12,6 +12,10 @@ tags:
12
  <img src="./hxa079.png" style="border-radius: 15px; width: 60%; height: 60%; object-fit: cover; box-shadow: 10px 10px 20px rgba(0, 0, 0, 0.5); border: 2px solid white;" alt="PRWKV" />
13
  </div>
14
 
 
 
 
 
15
  ### Model Description
16
 
17
  RWKV-Reka-3.1 Flash is an RNN hybrid architecture model that combines RWKV v7's linear attention mechanism with Group Query Attention (GQA) layers. Built upon the Reka-flash3.1 21B foundation, this model replaces most Transformer attention blocks with RWKV blocks while strategically maintaining some GQA layers to enhance performance on specific tasks.
 
12
  <img src="./hxa079.png" style="border-radius: 15px; width: 60%; height: 60%; object-fit: cover; box-shadow: 10px 10px 20px rgba(0, 0, 0, 0.5); border: 2px solid white;" alt="PRWKV" />
13
  </div>
14
 
15
+ > I'm simply exploring the possibility of linearizing existing Transformer models.
16
+ > It's still far from perfect,
17
+ > but I hope you'll bear with me as I continue this journey. :)
18
+
19
  ### Model Description
20
 
21
  RWKV-Reka-3.1 Flash is an RNN hybrid architecture model that combines RWKV v7's linear attention mechanism with Group Query Attention (GQA) layers. Built upon the Reka-flash3.1 21B foundation, this model replaces most Transformer attention blocks with RWKV blocks while strategically maintaining some GQA layers to enhance performance on specific tasks.