RLHFlow
/

DPA-v1-Mistral-7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Haoxiang-Wang commited on Apr 20, 2024

Commit

c731b39

·

verified ·

1 Parent(s): 76dec03

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -34,11 +34,11 @@ This is the model card of a 🤗 transformers model that has been pushed on the
 Use the code below to get started with the model.
 + System Prompt:
   + Template: `"You are a helpful, respectful, and honest assistant who always responds to the user in a harmless way. Your response should maximize weighted rating = helpfulness*{weight_helpfulness} + verbosity*{weight_verbosity}"`
   + Value Choices: `weight_helpfulness` is an integer from 0 to 100 and `(weight_verbosity/100)**2 + (weight_helpfulness/100)**2 == 1`
     + The maximum `weight_helpfulness` is 100 the lowest suggested value is 71.
 We suggest starting with a ratio of `weight_verbosity/weight_helpfulness` first. For instance, considering `weight_verbosity/weight_helpfulness` is equal to `tan(-15°)`
 ```python
@@ -47,7 +47,7 @@ import torch
 import numpy as np
 # Here we show how to use the DPA model to generate a response to a user prompt.
-device = "cuda:2"
 model = AutoModelForCausalLM.from_pretrained("Haoxiang-Wang/DPA-v1-Mistral-7B", torch_dtype=torch.bfloat16, device_map=device)
 tokenizer = AutoTokenizer.from_pretrained("Haoxiang-Wang/DPA-v1-Mistral-7B")
 degree = -15 # weight_verbosity/weight_helpfulness = tan(-15°)

 Use the code below to get started with the model.
 + System Prompt:
   + Template: `"You are a helpful, respectful, and honest assistant who always responds to the user in a harmless way. Your response should maximize weighted rating = helpfulness*{weight_helpfulness} + verbosity*{weight_verbosity}"`
   + Value Choices: `weight_helpfulness` is an integer from 0 to 100 and `(weight_verbosity/100)**2 + (weight_helpfulness/100)**2 == 1`
     + The maximum `weight_helpfulness` is 100 the lowest suggested value is 71.
+    + The model will generate a response that implicitly maximizes the weighted rating `helpfulness*weight_helpfulness + verbosity*weight_verbosity`, where `helpfulness` and `verbosity` are two reward objectives that range from 0 to 100.
 We suggest starting with a ratio of `weight_verbosity/weight_helpfulness` first. For instance, considering `weight_verbosity/weight_helpfulness` is equal to `tan(-15°)`
 ```python
 import numpy as np
 # Here we show how to use the DPA model to generate a response to a user prompt.
+device = "cuda"
 model = AutoModelForCausalLM.from_pretrained("Haoxiang-Wang/DPA-v1-Mistral-7B", torch_dtype=torch.bfloat16, device_map=device)
 tokenizer = AutoTokenizer.from_pretrained("Haoxiang-Wang/DPA-v1-Mistral-7B")
 degree = -15 # weight_verbosity/weight_helpfulness = tan(-15°)