Update README.md
Browse files
README.md
CHANGED
@@ -66,6 +66,31 @@ Your message here!
|
|
66 |
For best results, format all inputs in this manner. **Make sure to include a newline after `<|assistant|>`, this can affect generation quality quite a bit.**
|
67 |
We have included a [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) in the tokenizer implementing this template.
|
68 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
69 |
## Intended uses & limitations
|
70 |
|
71 |
The model was initially fine-tuned on a filtered and preprocessed of the [Tulu V2 mix dataset](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture), which contains a diverse range of human created instructions and synthetic dialogues generated primarily by other LLMs.
|
|
|
66 |
For best results, format all inputs in this manner. **Make sure to include a newline after `<|assistant|>`, this can affect generation quality quite a bit.**
|
67 |
We have included a [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) in the tokenizer implementing this template.
|
68 |
|
69 |
+
## Model Family
|
70 |
+
|
71 |
+
| DPO Models | PPO Models | Reward Models | Value Models |
|
72 |
+
|-------------|-------------|---------------|---------------|
|
73 |
+
| [tulu-v2.5-dpo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-uf-mean) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm) | [tulu-v2.5-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-70b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value) |
|
74 |
+
| = | [tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm) | [tulu-v2.5-13b-preference-mix-rm](https://huggingface.co/allenai/tulu-v2.5-13b-preference-mix-rm) | [tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm-value) |
|
75 |
+
| = | [tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm) | [tulu-v2.5-70b-preference-mix-rm](https://huggingface.co/allenai/tulu-v2.5-70b-preference-mix-rm) | [tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm-value) |
|
76 |
+
| = | [tulu-v2.5-ppo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean) | [tulu-v2.5-13b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-13b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value) |
|
77 |
+
| = | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts) | [tulu-v2.5-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-70b-uf-rm) * with extra prompts | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts-value) |
|
78 |
+
| [tulu-v2.5-dpo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-hh-rlhf-60k) | [tulu-v2.5-ppo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-hh-rlhf-60k) | [tulu-v2.5-13b-hh-rlhf-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-hh-rlhf-60k-rm) | |
|
79 |
+
| [tulu-v2.5-dpo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-chatbot-arena-2023) | [tulu-v2.5-ppo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-chatbot-arena-2023) | [tulu-v2.5-13b-chatbot-arena-2023-rm](https://huggingface.co/allenai/tulu-v2.5-13b-chatbot-arena-2023-rm) | |
|
80 |
+
| [tulu-v2.5-dpo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-stackexchange-60k) | [tulu-v2.5-ppo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-stackexchange-60k) | [tulu-v2.5-13b-stackexchange-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-stackexchange-60k-rm) | |
|
81 |
+
| [tulu-v2.5-dpo-13b-nectar](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-nectar) | [tulu-v2.5-ppo-13b-nectar-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-nectar-60k) | [tulu-v2.5-13b-nectar-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-nectar-60k-rm) | |
|
82 |
+
| [tulu-v2.5-dpo-13b-helpsteer](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-helpsteer) | | | |
|
83 |
+
| [tulu-v2.5-dpo-13b-shp2](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-shp2) | | | |
|
84 |
+
| [tulu-v2.5-dpo-13b-stackexchange](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-stackexchange) | | | |
|
85 |
+
| [tulu-v2.5-dpo-13b-uf-overall](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-uf-overall) | | | |
|
86 |
+
| [tulu-v2.5-dpo-13b-capybara](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-capybara) | | | |
|
87 |
+
| [tulu-v2.5-dpo-13b-prm-phase-2](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-prm-phase-2) | | | |
|
88 |
+
| [tulu-v2.5-dpo-13b-hh-rlhf](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-hh-rlhf) | | | |
|
89 |
+
| [tulu-v2.5-dpo-13b-chatbot-arena-2024](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-chatbot-arena-2024) | | | |
|
90 |
+
| [tulu-v2.5-dpo-13b-alpacafarm-human-pref](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-alpacafarm-human-pref) | | | |
|
91 |
+
| [tulu-v2.5-dpo-13b-alpacafarm-gpt4-pref](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-alpacafarm-gpt4-pref) | | | |
|
92 |
+
| [tulu-v2.5-dpo-13b-argilla-orca-pairs](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-argilla-orca-pairs) | | | |
|
93 |
+
|
94 |
## Intended uses & limitations
|
95 |
|
96 |
The model was initially fine-tuned on a filtered and preprocessed of the [Tulu V2 mix dataset](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture), which contains a diverse range of human created instructions and synthetic dialogues generated primarily by other LLMs.
|