Update README.md
Browse files
README.md
CHANGED
@@ -39,6 +39,17 @@ This model is a improve from the old one. It's have the new the tokenizer_config
|
|
39 |
Our dataset was make base on our university sudent notebook. It includes majors, university regulations and other information about our university.
|
40 |
[hcmue_qa](https://huggingface.co/datasets/Tamnemtf/hcmue_qa)
|
41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
### Training Procedure
|
43 |
|
44 |
```python
|
|
|
39 |
Our dataset was make base on our university sudent notebook. It includes majors, university regulations and other information about our university.
|
40 |
[hcmue_qa](https://huggingface.co/datasets/Tamnemtf/hcmue_qa)
|
41 |
|
42 |
+
## Instruction Format
|
43 |
+
In order to leverage instruction fine-tuning, your prompt should be surrounded by <|im_start|> and <|im_end|> tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
|
44 |
+
|
45 |
+
E.g.
|
46 |
+
```python
|
47 |
+
role ="user"
|
48 |
+
prompt ="hi"
|
49 |
+
chatml = f"<|im_start|>{role}\n{prompt}<|im_end|>\n"
|
50 |
+
```
|
51 |
+
Here is the [dataset](https://huggingface.co/datasets/Tamnemtf/hcmue-new-template) after adding this format.
|
52 |
+
|
53 |
### Training Procedure
|
54 |
|
55 |
```python
|