Update README.md
Browse files
README.md
CHANGED
@@ -34,11 +34,11 @@ This is the model card of a 🤗 transformers model that has been pushed on the
|
|
34 |
|
35 |
Use the code below to get started with the model.
|
36 |
|
37 |
-
|
38 |
+ System Prompt:
|
39 |
+ Template: `"You are a helpful, respectful, and honest assistant who always responds to the user in a harmless way. Your response should maximize weighted rating = helpfulness*{weight_helpfulness} + verbosity*{weight_verbosity}"`
|
40 |
+ Value Choices: `weight_helpfulness` is an integer from 0 to 100 and `(weight_verbosity/100)**2 + (weight_helpfulness/100)**2 == 1`
|
41 |
+ The maximum `weight_helpfulness` is 100 the lowest suggested value is 71.
|
|
|
42 |
|
43 |
We suggest starting with a ratio of `weight_verbosity/weight_helpfulness` first. For instance, considering `weight_verbosity/weight_helpfulness` is equal to `tan(-15°)`
|
44 |
```python
|
@@ -47,7 +47,7 @@ import torch
|
|
47 |
import numpy as np
|
48 |
|
49 |
# Here we show how to use the DPA model to generate a response to a user prompt.
|
50 |
-
device = "cuda
|
51 |
model = AutoModelForCausalLM.from_pretrained("Haoxiang-Wang/DPA-v1-Mistral-7B", torch_dtype=torch.bfloat16, device_map=device)
|
52 |
tokenizer = AutoTokenizer.from_pretrained("Haoxiang-Wang/DPA-v1-Mistral-7B")
|
53 |
degree = -15 # weight_verbosity/weight_helpfulness = tan(-15°)
|
|
|
34 |
|
35 |
Use the code below to get started with the model.
|
36 |
|
|
|
37 |
+ System Prompt:
|
38 |
+ Template: `"You are a helpful, respectful, and honest assistant who always responds to the user in a harmless way. Your response should maximize weighted rating = helpfulness*{weight_helpfulness} + verbosity*{weight_verbosity}"`
|
39 |
+ Value Choices: `weight_helpfulness` is an integer from 0 to 100 and `(weight_verbosity/100)**2 + (weight_helpfulness/100)**2 == 1`
|
40 |
+ The maximum `weight_helpfulness` is 100 the lowest suggested value is 71.
|
41 |
+
+ The model will generate a response that implicitly maximizes the weighted rating `helpfulness*weight_helpfulness + verbosity*weight_verbosity`, where `helpfulness` and `verbosity` are two reward objectives that range from 0 to 100.
|
42 |
|
43 |
We suggest starting with a ratio of `weight_verbosity/weight_helpfulness` first. For instance, considering `weight_verbosity/weight_helpfulness` is equal to `tan(-15°)`
|
44 |
```python
|
|
|
47 |
import numpy as np
|
48 |
|
49 |
# Here we show how to use the DPA model to generate a response to a user prompt.
|
50 |
+
device = "cuda"
|
51 |
model = AutoModelForCausalLM.from_pretrained("Haoxiang-Wang/DPA-v1-Mistral-7B", torch_dtype=torch.bfloat16, device_map=device)
|
52 |
tokenizer = AutoTokenizer.from_pretrained("Haoxiang-Wang/DPA-v1-Mistral-7B")
|
53 |
degree = -15 # weight_verbosity/weight_helpfulness = tan(-15°)
|