Update README.md
Browse files
README.md
CHANGED
@@ -68,28 +68,30 @@ We have included a [chat template](https://huggingface.co/docs/transformers/main
|
|
68 |
|
69 |
## Model Family
|
70 |
|
71 |
-
| DPO Models | PPO Models | Reward Models | Value Models |
|
72 |
-
|
73 |
-
| [tulu-v2.5-dpo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-uf-mean) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm) | [tulu-v2.5-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-70b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value) |
|
74 |
-
| = | [tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm) | [tulu-v2.5-13b-preference-mix-rm](https://huggingface.co/allenai/tulu-v2.5-13b-preference-mix-rm) | [tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm-value) |
|
75 |
-
| = | [tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm) | [tulu-v2.5-70b-preference-mix-rm](https://huggingface.co/allenai/tulu-v2.5-70b-preference-mix-rm) | [tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm-value) |
|
76 |
-
| = | [tulu-v2.5-ppo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean) | [tulu-v2.5-13b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-13b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value) |
|
77 |
-
| = | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts) | [tulu-v2.5-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-70b-uf-rm) * with extra prompts | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts-value) |
|
78 |
-
| [tulu-v2.5-dpo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-hh-rlhf-60k) | [tulu-v2.5-ppo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-hh-rlhf-60k) | [tulu-v2.5-13b-hh-rlhf-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-hh-rlhf-60k-rm) | |
|
79 |
-
| [tulu-v2.5-dpo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-chatbot-arena-2023) | [tulu-v2.5-ppo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-chatbot-arena-2023) | [tulu-v2.5-13b-chatbot-arena-2023-rm](https://huggingface.co/allenai/tulu-v2.5-13b-chatbot-arena-2023-rm) | |
|
80 |
-
| [tulu-v2.5-dpo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-stackexchange-60k) | [tulu-v2.5-ppo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-stackexchange-60k) | [tulu-v2.5-13b-stackexchange-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-stackexchange-60k-rm) | |
|
81 |
-
|
|
82 |
-
| [tulu-v2.5-dpo-13b-
|
83 |
-
| [tulu-v2.5-dpo-13b-
|
84 |
-
| [tulu-v2.5-dpo-13b-
|
85 |
-
| [tulu-v2.5-dpo-13b-
|
86 |
-
| [tulu-v2.5-dpo-13b-
|
87 |
-
| [tulu-v2.5-dpo-13b-
|
88 |
-
| [tulu-v2.5-dpo-13b-
|
89 |
-
| [tulu-v2.5-dpo-13b-
|
90 |
-
| [tulu-v2.5-dpo-13b-
|
91 |
-
| [tulu-v2.5-dpo-13b-alpacafarm-
|
92 |
-
| [tulu-v2.5-dpo-13b-
|
|
|
|
|
93 |
|
94 |
## Intended uses & limitations
|
95 |
|
|
|
68 |
|
69 |
## Model Family
|
70 |
|
71 |
+
[Tulu 2.5 Preference Split](https://huggingface.co/datasets/allenai/tulu-2.5-preference-data) | DPO Models | PPO Models | Reward Models | Value Models |
|
72 |
+
|-------------|-------------|-------------|---------------|---------------|
|
73 |
+
| ultrafeedback_mean_aspects | [tulu-v2.5-dpo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-uf-mean) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm) | [tulu-v2.5-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-70b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value) |
|
74 |
+
| ultrafeedback_mean_aspects | = | [tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm) | [tulu-v2.5-13b-preference-mix-rm](https://huggingface.co/allenai/tulu-v2.5-13b-preference-mix-rm) | [tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm-value) |
|
75 |
+
| ultrafeedback_mean_aspects | = | [tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm) | [tulu-v2.5-70b-preference-mix-rm](https://huggingface.co/allenai/tulu-v2.5-70b-preference-mix-rm) | [tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm-value) |
|
76 |
+
| ultrafeedback_mean_aspects | = | [tulu-v2.5-ppo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean) | [tulu-v2.5-13b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-13b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value) |
|
77 |
+
| ultrafeedback_mean_aspects | = | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts) | [tulu-v2.5-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-70b-uf-rm) * with extra prompts | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts-value) |
|
78 |
+
| hh_rlhf_60k | [tulu-v2.5-dpo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-hh-rlhf-60k) | [tulu-v2.5-ppo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-hh-rlhf-60k) | [tulu-v2.5-13b-hh-rlhf-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-hh-rlhf-60k-rm) | |
|
79 |
+
| chatbot_arena_2023 | [tulu-v2.5-dpo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-chatbot-arena-2023) | [tulu-v2.5-ppo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-chatbot-arena-2023) | [tulu-v2.5-13b-chatbot-arena-2023-rm](https://huggingface.co/allenai/tulu-v2.5-13b-chatbot-arena-2023-rm) | |
|
80 |
+
| stack_exchange_60k | [tulu-v2.5-dpo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-stackexchange-60k) | [tulu-v2.5-ppo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-stackexchange-60k) | [tulu-v2.5-13b-stackexchange-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-stackexchange-60k-rm) | |
|
81 |
+
| nectar_60k | N/A | [tulu-v2.5-ppo-13b-nectar-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-nectar-60k) | [tulu-v2.5-13b-nectar-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-nectar-60k-rm) | |
|
82 |
+
| nectar | [tulu-v2.5-dpo-13b-nectar](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-nectar) | | | |
|
83 |
+
| helpsteer | [tulu-v2.5-dpo-13b-helpsteer](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-helpsteer) | | | |
|
84 |
+
| shp2 | [tulu-v2.5-dpo-13b-shp2](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-shp2) | | | |
|
85 |
+
| stack_exchange_paired | [tulu-v2.5-dpo-13b-stackexchange](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-stackexchange) | | | |
|
86 |
+
| ultrafeedback_overall | [tulu-v2.5-dpo-13b-uf-overall](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-uf-overall) | | | |
|
87 |
+
| capybara | [tulu-v2.5-dpo-13b-capybara](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-capybara) | | | |
|
88 |
+
| prm800k_pairs_phase2 | [tulu-v2.5-dpo-13b-prm-phase-2](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-prm-phase-2) | | | |
|
89 |
+
| hh_rlhf | [tulu-v2.5-dpo-13b-hh-rlhf](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-hh-rlhf) | | | |
|
90 |
+
| chatbot_arena_2024 | [tulu-v2.5-dpo-13b-chatbot-arena-2024](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-chatbot-arena-2024) | | | |
|
91 |
+
| alpaca_farm_human_pref | [tulu-v2.5-dpo-13b-alpacafarm-human-pref](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-alpacafarm-human-pref) | | | |
|
92 |
+
| alpaca_farm_gpt4_pref | [tulu-v2.5-dpo-13b-alpacafarm-gpt4-pref](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-alpacafarm-gpt4-pref) | | | |
|
93 |
+
| orca_dpo_pairs | [tulu-v2.5-dpo-13b-argilla-orca-pairs](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-argilla-orca-pairs) | | | |
|
94 |
+
|
95 |
|
96 |
## Intended uses & limitations
|
97 |
|