Update README.md
Browse files
README.md
CHANGED
|
@@ -38,8 +38,7 @@ The differences between World & Raven:
|
|
| 38 |
* set pipeline = PIPELINE(model, "rwkv_vocab_v20230424") instead of 20B_tokenizer.json (EXACTLY AS WRITTEN HERE. "rwkv_vocab_v20230424" is included in rwkv 0.7.4+)
|
| 39 |
* use Question/Answer or User/AI or Human/Bot for chat. **DO NOT USE Bob/Alice or Q/A**
|
| 40 |
|
| 41 |
-
For
|
| 42 |
-
Example strategy: cuda fp32 *1 -> cuda fp16
|
| 43 |
|
| 44 |
NOTE: the new greedy tokenizer (https://github.com/BlinkDL/ChatRWKV/blob/main/tokenizer/rwkv_tokenizer.py) will tokenize '\n\n' as one single token instead of ['\n','\n']
|
| 45 |
|
|
@@ -60,11 +59,11 @@ Response:
|
|
| 60 |
|
| 61 |
A good chat prompt (replace \n\n in xxx to \n):
|
| 62 |
```
|
| 63 |
-
|
| 64 |
|
| 65 |
-
|
| 66 |
|
| 67 |
-
|
| 68 |
|
| 69 |
-
|
| 70 |
```
|
|
|
|
| 38 |
* set pipeline = PIPELINE(model, "rwkv_vocab_v20230424") instead of 20B_tokenizer.json (EXACTLY AS WRITTEN HERE. "rwkv_vocab_v20230424" is included in rwkv 0.7.4+)
|
| 39 |
* use Question/Answer or User/AI or Human/Bot for chat. **DO NOT USE Bob/Alice or Q/A**
|
| 40 |
|
| 41 |
+
For 0.1/0.4/1.5B models, use **fp32** for first layer (will overflow in fp16 at this moment - fixable in future), or bf16 if you have 30xx/40xx GPUs. Example strategy: cuda fp32 *1 -> cuda fp16
|
|
|
|
| 42 |
|
| 43 |
NOTE: the new greedy tokenizer (https://github.com/BlinkDL/ChatRWKV/blob/main/tokenizer/rwkv_tokenizer.py) will tokenize '\n\n' as one single token instead of ['\n','\n']
|
| 44 |
|
|
|
|
| 59 |
|
| 60 |
A good chat prompt (replace \n\n in xxx to \n):
|
| 61 |
```
|
| 62 |
+
User: hi
|
| 63 |
|
| 64 |
+
Assistant: Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.
|
| 65 |
|
| 66 |
+
User: xxx
|
| 67 |
|
| 68 |
+
Assistant:
|
| 69 |
```
|