hamishivi
/

tulu-v2.5-7b-uf-mean-7b-uf-rm-value

Token Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hamishivi commited on Jun 25, 2024

Commit

b8fdd2e

·

verified ·

1 Parent(s): 7f33f1a

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ license: apache-2.0
 Tulu is a series of language models that are trained to act as helpful assistants.
 Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
 This is a **value** model produced during the PPO training of [this](https://huggingface.co/hamishivi/tulu-v2.5-ppo-7b-uf-mean) model.
-It was initialised from the [Tulu v2.573B UltraFeedback RM](https://huggingface.co/hamishivi/tulu-v2.5-7b-uf-rm).
 We release the value model as it may provide a good starting point for additional research or improved decoding with our released PPO models.
 At time of writing, you may have to [install transformers from source](https://huggingface.co/docs/transformers/en/installation#install-from-source) to get the `LlamaForTokenClassification` class.

 Tulu is a series of language models that are trained to act as helpful assistants.
 Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
 This is a **value** model produced during the PPO training of [this](https://huggingface.co/hamishivi/tulu-v2.5-ppo-7b-uf-mean) model.
+It was initialised from the [Tulu v2.5 7B UltraFeedback RM](https://huggingface.co/hamishivi/tulu-v2.5-7b-uf-rm).
 We release the value model as it may provide a good starting point for additional research or improved decoding with our released PPO models.
 At time of writing, you may have to [install transformers from source](https://huggingface.co/docs/transformers/en/installation#install-from-source) to get the `LlamaForTokenClassification` class.