Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,25 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- ja
|
5 |
---
|
6 |
+
# RWKV-x060-Japanese-11.2B
|
7 |
+
|
8 |
+
## RWKV Architecture "Finch" based 11.2B Parameters Model.
|
9 |
+
|
10 |
+
継続トレーニング中です。実験なので性能評価はしていません。
|
11 |
+
|
12 |
+
- "YORINOBU"
|
13 |
+
- Based on RWKV6-World v2.1 7b 53% Model, we have applied a layer expansion approach and tuned it as a 48-layer, 4096-dimensional model.
|
14 |
+
- I added 8 layers to the 40-layer model, froze layers 0 to 39, and continued pre-training layers 40 to 47, along with the Embedding and Head layers, using a Japanese corpus.
|
15 |
+
- Since it is an experimental approach, it may exhibit unpredictable behavior.
|
16 |
+
- RWKV6-World v2.1 7b 53% Modelをベースに、レイヤー拡張アプローチを適用し、48層4096次元モデルとしてチューニングしました。
|
17 |
+
- 40層モデルに8層を追加し、0から39レイヤーまでを凍結し、40から47、Emb、Head層を日本語コーパスで継続事前学習を行いました。
|
18 |
+
- 実験的アプローチなので、予測不可能な挙動をする可能性があります
|
19 |
+
|
20 |
+
## Training
|
21 |
+
- using RWKV-LM-LISA Anarchy mode, Continuous Pre-traning
|
22 |
+
- https://github.com/OpenMOSE/RWKV-LM-LISA
|
23 |
+
- Single A6000 LISA 4layer training each step
|
24 |
+
|
25 |
+
2024 OpenMOSE
|