Update README.md
Browse files
README.md
CHANGED
@@ -48,6 +48,8 @@ One big issue I noticed is that I think I set too small of a learning rate for S
|
|
48 |
Other small issue is that when you enter a prompt that might have resulted with refusal in a previous model, the response will be more free-form and probably will have a touch of completion in it.
|
49 |
So far, it seems like the strongest anti-refusal bias is at 0 ctx - the first prompt. But it's also present, albeit a little bit less, further down. I plan to expand rawrr dataset and include more samples without system prompt, this should help here.
|
50 |
|
|
|
|
|
51 |
|
52 |
## Unsloth training parameters DPO Stage
|
53 |
|
@@ -71,7 +73,7 @@ So far, it seems like the strongest anti-refusal bias is at 0 ctx - the first pr
|
|
71 |
- lora_alpha: 32
|
72 |
- max_length: 2200
|
73 |
- learning_rate: 0.00006
|
74 |
-
- lr_scheduler_type: "cosine
|
75 |
- lr_scheduler_kwargs: {
|
76 |
"num_cycles" : 0.3,
|
77 |
}
|
|
|
48 |
Other small issue is that when you enter a prompt that might have resulted with refusal in a previous model, the response will be more free-form and probably will have a touch of completion in it.
|
49 |
So far, it seems like the strongest anti-refusal bias is at 0 ctx - the first prompt. But it's also present, albeit a little bit less, further down. I plan to expand rawrr dataset and include more samples without system prompt, this should help here.
|
50 |
|
51 |
+
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" alt="made with Unsloth" width="400" height="64"/>](https://github.com/unslothai/unsloth)
|
52 |
+
|
53 |
|
54 |
## Unsloth training parameters DPO Stage
|
55 |
|
|
|
73 |
- lora_alpha: 32
|
74 |
- max_length: 2200
|
75 |
- learning_rate: 0.00006
|
76 |
+
- lr_scheduler_type: "cosine"
|
77 |
- lr_scheduler_kwargs: {
|
78 |
"num_cycles" : 0.3,
|
79 |
}
|