allenai
/

tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm

@@ -66,6 +66,31 @@ Your message here!
 For best results, format all inputs in this manner. **Make sure to include a newline after `<|assistant|>`, this can affect generation quality quite a bit.**
 We have included a [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) in the tokenizer implementing this template.
 ## Intended uses & limitations
 The model was initially fine-tuned on a filtered and preprocessed of the [Tulu V2 mix dataset](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture), which contains a diverse range of human created instructions and synthetic dialogues generated primarily by other LLMs.

 For best results, format all inputs in this manner. **Make sure to include a newline after `<|assistant|>`, this can affect generation quality quite a bit.**
 We have included a [chat template](https://huggingface.co/docs/transformers/main/en/chat_templating) in the tokenizer implementing this template.
+## Model Family
+| DPO Models | PPO Models | Reward Models | Value Models |
+|-------------|-------------|---------------|---------------|
+| [tulu-v2.5-dpo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-uf-mean) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm) | [tulu-v2.5-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-70b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value) |
+|  =  | [tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm) | [tulu-v2.5-13b-preference-mix-rm](https://huggingface.co/allenai/tulu-v2.5-13b-preference-mix-rm) | [tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm-value) |
+|  =  | [tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm) | [tulu-v2.5-70b-preference-mix-rm](https://huggingface.co/allenai/tulu-v2.5-70b-preference-mix-rm) | [tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm-value) |
+|  =  | [tulu-v2.5-ppo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean) | [tulu-v2.5-13b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-13b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value) |
+|  =  | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts) | [tulu-v2.5-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-70b-uf-rm) * with extra prompts | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts-value) |
+| [tulu-v2.5-dpo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-hh-rlhf-60k) | [tulu-v2.5-ppo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-hh-rlhf-60k) | [tulu-v2.5-13b-hh-rlhf-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-hh-rlhf-60k-rm) |  |
+| [tulu-v2.5-dpo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-chatbot-arena-2023) | [tulu-v2.5-ppo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-chatbot-arena-2023) | [tulu-v2.5-13b-chatbot-arena-2023-rm](https://huggingface.co/allenai/tulu-v2.5-13b-chatbot-arena-2023-rm) |  |
+| [tulu-v2.5-dpo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-stackexchange-60k) | [tulu-v2.5-ppo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-stackexchange-60k) | [tulu-v2.5-13b-stackexchange-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-stackexchange-60k-rm) |  |
+| [tulu-v2.5-dpo-13b-nectar](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-nectar) | [tulu-v2.5-ppo-13b-nectar-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-nectar-60k) | [tulu-v2.5-13b-nectar-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-nectar-60k-rm) |  |
+| [tulu-v2.5-dpo-13b-helpsteer](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-helpsteer) |  |  |  |
+| [tulu-v2.5-dpo-13b-shp2](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-shp2) |  |  |  |
+| [tulu-v2.5-dpo-13b-stackexchange](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-stackexchange) |  |  |  |
+| [tulu-v2.5-dpo-13b-uf-overall](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-uf-overall) |  |  |  |
+| [tulu-v2.5-dpo-13b-capybara](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-capybara) |  |  |  |
+| [tulu-v2.5-dpo-13b-prm-phase-2](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-prm-phase-2) |  |  |  |
+| [tulu-v2.5-dpo-13b-hh-rlhf](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-hh-rlhf) |  |  |  |
+| [tulu-v2.5-dpo-13b-chatbot-arena-2024](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-chatbot-arena-2024) |  |  |  |
+| [tulu-v2.5-dpo-13b-alpacafarm-human-pref](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-alpacafarm-human-pref) |  |  |  |
+| [tulu-v2.5-dpo-13b-alpacafarm-gpt4-pref](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-alpacafarm-gpt4-pref) |  |  |  |
+| [tulu-v2.5-dpo-13b-argilla-orca-pairs](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-argilla-orca-pairs) |  |  |  |
 ## Intended uses & limitations
 The model was initially fine-tuned on a filtered and preprocessed of the [Tulu V2 mix dataset](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture), which contains a diverse range of human created instructions and synthetic dialogues generated primarily by other LLMs.