natolambert
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -68,13 +68,13 @@ We have included a [chat template](https://huggingface.co/docs/transformers/main
|
|
68 |
|
69 |
## Model Family
|
70 |
|
71 |
-
[
|
72 |
|-------------|-------------|-------------|---------------|---------------|
|
73 |
| ultrafeedback_mean_aspects | [tulu-v2.5-dpo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-uf-mean) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm) | [tulu-v2.5-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-70b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value) |
|
74 |
-
|
|
75 |
-
|
|
76 |
| ultrafeedback_mean_aspects | = | [tulu-v2.5-ppo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean) | [tulu-v2.5-13b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-13b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value) |
|
77 |
-
|
|
78 |
| hh_rlhf_60k | [tulu-v2.5-dpo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-hh-rlhf-60k) | [tulu-v2.5-ppo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-hh-rlhf-60k) | [tulu-v2.5-13b-hh-rlhf-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-hh-rlhf-60k-rm) | |
|
79 |
| chatbot_arena_2023 | [tulu-v2.5-dpo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-chatbot-arena-2023) | [tulu-v2.5-ppo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-chatbot-arena-2023) | [tulu-v2.5-13b-chatbot-arena-2023-rm](https://huggingface.co/allenai/tulu-v2.5-13b-chatbot-arena-2023-rm) | |
|
80 |
| stack_exchange_60k | [tulu-v2.5-dpo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-stackexchange-60k) | [tulu-v2.5-ppo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-stackexchange-60k) | [tulu-v2.5-13b-stackexchange-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-stackexchange-60k-rm) | |
|
@@ -92,6 +92,7 @@ We have included a [chat template](https://huggingface.co/docs/transformers/main
|
|
92 |
| alpaca_farm_gpt4_pref | [tulu-v2.5-dpo-13b-alpacafarm-gpt4-pref](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-alpacafarm-gpt4-pref) | | | |
|
93 |
| orca_dpo_pairs | [tulu-v2.5-dpo-13b-argilla-orca-pairs](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-argilla-orca-pairs) | | | |
|
94 |
|
|
|
95 |
|
96 |
## Intended uses & limitations
|
97 |
|
|
|
68 |
|
69 |
## Model Family
|
70 |
|
71 |
+
[Preference Data](https://huggingface.co/datasets/allenai/tulu-2.5-preference-data), [Prompts Data](https://huggingface.co/datasets/allenai/tulu-2.5-prompts) | DPO Models | PPO Models | Reward Models | Value Models |
|
72 |
|-------------|-------------|-------------|---------------|---------------|
|
73 |
| ultrafeedback_mean_aspects | [tulu-v2.5-dpo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-uf-mean) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm) | [tulu-v2.5-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-70b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value) |
|
74 |
+
| preference_big_mixture | = | [tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm) | [tulu-v2.5-13b-preference-mix-rm](https://huggingface.co/allenai/tulu-v2.5-13b-preference-mix-rm) | [tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm-value) |
|
75 |
+
| preference_big_mixture | = | [tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm) | [tulu-v2.5-70b-preference-mix-rm](https://huggingface.co/allenai/tulu-v2.5-70b-preference-mix-rm) | [tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm-value) |
|
76 |
| ultrafeedback_mean_aspects | = | [tulu-v2.5-ppo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean) | [tulu-v2.5-13b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-13b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value) |
|
77 |
+
| preference_big_mixture | = | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts) | [tulu-v2.5-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-70b-uf-rm) * with extra prompts | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts-value) |
|
78 |
| hh_rlhf_60k | [tulu-v2.5-dpo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-hh-rlhf-60k) | [tulu-v2.5-ppo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-hh-rlhf-60k) | [tulu-v2.5-13b-hh-rlhf-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-hh-rlhf-60k-rm) | |
|
79 |
| chatbot_arena_2023 | [tulu-v2.5-dpo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-chatbot-arena-2023) | [tulu-v2.5-ppo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-chatbot-arena-2023) | [tulu-v2.5-13b-chatbot-arena-2023-rm](https://huggingface.co/allenai/tulu-v2.5-13b-chatbot-arena-2023-rm) | |
|
80 |
| stack_exchange_60k | [tulu-v2.5-dpo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-stackexchange-60k) | [tulu-v2.5-ppo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-stackexchange-60k) | [tulu-v2.5-13b-stackexchange-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-stackexchange-60k-rm) | |
|
|
|
92 |
| alpaca_farm_gpt4_pref | [tulu-v2.5-dpo-13b-alpacafarm-gpt4-pref](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-alpacafarm-gpt4-pref) | | | |
|
93 |
| orca_dpo_pairs | [tulu-v2.5-dpo-13b-argilla-orca-pairs](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-argilla-orca-pairs) | | | |
|
94 |
|
95 |
+
*The extra prompts are all the prompts in the prompts dataset. Default only uses the split `ultrafeedback_prompts`.
|
96 |
|
97 |
## Intended uses & limitations
|
98 |
|