Text Generation
Transformers
PyTorch
English
llama
conversational
text-generation-inference
natolambert commited on
Commit
ea7d00e
·
verified ·
1 Parent(s): 883f3b0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -22
README.md CHANGED
@@ -68,28 +68,30 @@ We have included a [chat template](https://huggingface.co/docs/transformers/main
68
 
69
  ## Model Family
70
 
71
- | DPO Models | PPO Models | Reward Models | Value Models |
72
- |-------------|-------------|---------------|---------------|
73
- | [tulu-v2.5-dpo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-uf-mean) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm) | [tulu-v2.5-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-70b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value) |
74
- | = | [tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm) | [tulu-v2.5-13b-preference-mix-rm](https://huggingface.co/allenai/tulu-v2.5-13b-preference-mix-rm) | [tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm-value) |
75
- | = | [tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm) | [tulu-v2.5-70b-preference-mix-rm](https://huggingface.co/allenai/tulu-v2.5-70b-preference-mix-rm) | [tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm-value) |
76
- | = | [tulu-v2.5-ppo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean) | [tulu-v2.5-13b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-13b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value) |
77
- | = | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts) | [tulu-v2.5-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-70b-uf-rm) * with extra prompts | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts-value) |
78
- | [tulu-v2.5-dpo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-hh-rlhf-60k) | [tulu-v2.5-ppo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-hh-rlhf-60k) | [tulu-v2.5-13b-hh-rlhf-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-hh-rlhf-60k-rm) | |
79
- | [tulu-v2.5-dpo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-chatbot-arena-2023) | [tulu-v2.5-ppo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-chatbot-arena-2023) | [tulu-v2.5-13b-chatbot-arena-2023-rm](https://huggingface.co/allenai/tulu-v2.5-13b-chatbot-arena-2023-rm) | |
80
- | [tulu-v2.5-dpo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-stackexchange-60k) | [tulu-v2.5-ppo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-stackexchange-60k) | [tulu-v2.5-13b-stackexchange-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-stackexchange-60k-rm) | |
81
- | [tulu-v2.5-dpo-13b-nectar](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-nectar) | [tulu-v2.5-ppo-13b-nectar-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-nectar-60k) | [tulu-v2.5-13b-nectar-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-nectar-60k-rm) | |
82
- | [tulu-v2.5-dpo-13b-helpsteer](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-helpsteer) | | | |
83
- | [tulu-v2.5-dpo-13b-shp2](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-shp2) | | | |
84
- | [tulu-v2.5-dpo-13b-stackexchange](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-stackexchange) | | | |
85
- | [tulu-v2.5-dpo-13b-uf-overall](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-uf-overall) | | | |
86
- | [tulu-v2.5-dpo-13b-capybara](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-capybara) | | | |
87
- | [tulu-v2.5-dpo-13b-prm-phase-2](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-prm-phase-2) | | | |
88
- | [tulu-v2.5-dpo-13b-hh-rlhf](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-hh-rlhf) | | | |
89
- | [tulu-v2.5-dpo-13b-chatbot-arena-2024](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-chatbot-arena-2024) | | | |
90
- | [tulu-v2.5-dpo-13b-alpacafarm-human-pref](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-alpacafarm-human-pref) | | | |
91
- | [tulu-v2.5-dpo-13b-alpacafarm-gpt4-pref](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-alpacafarm-gpt4-pref) | | | |
92
- | [tulu-v2.5-dpo-13b-argilla-orca-pairs](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-argilla-orca-pairs) | | | |
 
 
93
 
94
  ## Intended uses & limitations
95
 
 
68
 
69
  ## Model Family
70
 
71
+ [Tulu 2.5 Preference Split](https://huggingface.co/datasets/allenai/tulu-2.5-preference-data) | DPO Models | PPO Models | Reward Models | Value Models |
72
+ |-------------|-------------|-------------|---------------|---------------|
73
+ | ultrafeedback_mean_aspects | [tulu-v2.5-dpo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-uf-mean) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm) | [tulu-v2.5-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-70b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-value) |
74
+ | ultrafeedback_mean_aspects | = | [tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm) | [tulu-v2.5-13b-preference-mix-rm](https://huggingface.co/allenai/tulu-v2.5-13b-preference-mix-rm) | [tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-mix-rm-value) |
75
+ | ultrafeedback_mean_aspects | = | [tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm) | [tulu-v2.5-70b-preference-mix-rm](https://huggingface.co/allenai/tulu-v2.5-70b-preference-mix-rm) | [tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm-value) |
76
+ | ultrafeedback_mean_aspects | = | [tulu-v2.5-ppo-13b-uf-mean](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean) | [tulu-v2.5-13b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-13b-uf-rm) | [tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-13b-uf-rm-value) |
77
+ | ultrafeedback_mean_aspects | = | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts) | [tulu-v2.5-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-70b-uf-rm) * with extra prompts | [tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts-value](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts-value) |
78
+ | hh_rlhf_60k | [tulu-v2.5-dpo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-hh-rlhf-60k) | [tulu-v2.5-ppo-13b-hh-rlhf-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-hh-rlhf-60k) | [tulu-v2.5-13b-hh-rlhf-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-hh-rlhf-60k-rm) | |
79
+ | chatbot_arena_2023 | [tulu-v2.5-dpo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-chatbot-arena-2023) | [tulu-v2.5-ppo-13b-chatbot-arena-2023](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-chatbot-arena-2023) | [tulu-v2.5-13b-chatbot-arena-2023-rm](https://huggingface.co/allenai/tulu-v2.5-13b-chatbot-arena-2023-rm) | |
80
+ | stack_exchange_60k | [tulu-v2.5-dpo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-stackexchange-60k) | [tulu-v2.5-ppo-13b-stackexchange-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-stackexchange-60k) | [tulu-v2.5-13b-stackexchange-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-stackexchange-60k-rm) | |
81
+ | nectar_60k | N/A | [tulu-v2.5-ppo-13b-nectar-60k](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-nectar-60k) | [tulu-v2.5-13b-nectar-60k-rm](https://huggingface.co/allenai/tulu-v2.5-13b-nectar-60k-rm) | |
82
+ | nectar | [tulu-v2.5-dpo-13b-nectar](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-nectar) | | | |
83
+ | helpsteer | [tulu-v2.5-dpo-13b-helpsteer](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-helpsteer) | | | |
84
+ | shp2 | [tulu-v2.5-dpo-13b-shp2](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-shp2) | | | |
85
+ | stack_exchange_paired | [tulu-v2.5-dpo-13b-stackexchange](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-stackexchange) | | | |
86
+ | ultrafeedback_overall | [tulu-v2.5-dpo-13b-uf-overall](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-uf-overall) | | | |
87
+ | capybara | [tulu-v2.5-dpo-13b-capybara](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-capybara) | | | |
88
+ | prm800k_pairs_phase2 | [tulu-v2.5-dpo-13b-prm-phase-2](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-prm-phase-2) | | | |
89
+ | hh_rlhf | [tulu-v2.5-dpo-13b-hh-rlhf](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-hh-rlhf) | | | |
90
+ | chatbot_arena_2024 | [tulu-v2.5-dpo-13b-chatbot-arena-2024](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-chatbot-arena-2024) | | | |
91
+ | alpaca_farm_human_pref | [tulu-v2.5-dpo-13b-alpacafarm-human-pref](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-alpacafarm-human-pref) | | | |
92
+ | alpaca_farm_gpt4_pref | [tulu-v2.5-dpo-13b-alpacafarm-gpt4-pref](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-alpacafarm-gpt4-pref) | | | |
93
+ | orca_dpo_pairs | [tulu-v2.5-dpo-13b-argilla-orca-pairs](https://huggingface.co/allenai/tulu-v2.5-dpo-13b-argilla-orca-pairs) | | | |
94
+
95
 
96
  ## Intended uses & limitations
97