Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,44 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
# RWKV-x070-2B9-CJE-Instruct Model Card
|
5 |
+
|
6 |
+
## Model Overview
|
7 |
+
- **Model Name**: RWKV-x070-2B9-CJE-Instruct
|
8 |
+
- **Description**: An instruction-tuned model specialized for Japanese, Chinese, and English languages
|
9 |
+
- **Base Model**: rwkv-x070-2b9-world-v3-40%trained-20250113-ctx4k.pth
|
10 |
+
- **Architecture**: RWKV x070 "Goose"
|
11 |
+
- **Parameters**: 2.9B
|
12 |
+
- **Model Dimension**: 2048
|
13 |
+
- **Number of Layers**: 32
|
14 |
+
|
15 |
+
## Fine-tuning Details
|
16 |
+
|
17 |
+
### Training Configuration
|
18 |
+
- **Trainer**: RWKV-LM-RLHF
|
19 |
+
- **PEFT Mode**: Hybrid learning combining frozen embeddings and Bone (Block Affine Transformation) + full parameter training
|
20 |
+
- **SFT Method**: SmoothingLoss SFT
|
21 |
+
- **Context Window**: 5120 (trained with 1024 token overlap)
|
22 |
+
|
23 |
+
### Dataset Specifications
|
24 |
+
- **Size**: 800k pairs
|
25 |
+
- **Content**:
|
26 |
+
- Mixed data in Japanese, Chinese, and English
|
27 |
+
- Conversations
|
28 |
+
- Programming code
|
29 |
+
- Translation tasks
|
30 |
+
- Chain-of-Thought reasoning tasks
|
31 |
+
|
32 |
+
### Important Note
|
33 |
+
- Set the end token as '\n\n\x17'
|
34 |
+
|
35 |
+
### Limitations and Considerations
|
36 |
+
- This is an experimental model; inference stability is not fully guaranteed
|
37 |
+
- Unexpected behaviors may occur
|
38 |
+
- Continuous improvements are being made; feedback is welcome
|
39 |
+
|
40 |
+
## License
|
41 |
+
Apache License 2.0
|
42 |
+
|
43 |
+
## Acknowledgments
|
44 |
+
We express our gratitude to the RWKV base model and the RWKV community for their support in developing this model.
|