allenai
/

OLMo-2-0325-32B-DPO

Text Generation

Model card Files Files and versions Community

vwxyzjn commited on 11 days ago

Commit

c6be2af

·

verified ·

1 Parent(s): 8a83ddb

Update README.md

Files changed (1) hide show

README.md +6 -2

README.md CHANGED Viewed

@@ -2,12 +2,16 @@
 license: apache-2.0
 language:
 - en
 pipeline_tag: text-generation
 ---
 <img alt="OLMo Logo" src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/olmo2/olmo.png" width="242px">
-OLMo 2 32B Instruct March 2025 is post-trained variant of the [OLMo-2 32B March 2025](https://huggingface.co/allenai/OLMo-2-0325-32B/) model, which has undergone supervised finetuning on an OLMo-specific variant of the [Tülu 3 dataset](https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-2-mixture) and further DPO training on [this dataset](https://huggingface.co/datasets/allenai/olmo-2-32b-pref-mix-v1).
 Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
 Check out the [OLMo 2 paper](https://arxiv.org/abs/2501.00656) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
@@ -20,7 +24,7 @@ These models are trained on the Dolma dataset. We are releasing all code, checkp
 - **Model type:** A model trained on a mix of publicly available, synthetic and human-created datasets.
 - **Language(s) (NLP):** Primarily English
 - **License:** Apache 2.0
-- **Finetuned from model:** allenai/OLMo-2-0325-32B
 ### Model Sources

 license: apache-2.0
 language:
 - en
+datasets:
+- allenai/olmo-2-0325-32b-preference-mix
+base_model:
+- allenai/OLMo-2-0325-32B-SFT
 pipeline_tag: text-generation
 ---
 <img alt="OLMo Logo" src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/olmo2/olmo.png" width="242px">
+OLMo 2 32B Instruct March 2025 is post-trained variant of the [OLMo-2 32B March 2025](https://huggingface.co/allenai/OLMo-2-0325-32B/) model, which has undergone supervised finetuning on an OLMo-specific variant of the [Tülu 3 dataset](https://huggingface.co/datasets/allenai/tulu-3-sft-olmo-2-mixture) and further DPO training on [this dataset](https://huggingface.co/datasets/allenai/olmo-2-0325-32b-preference-mix).
 Tülu 3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
 Check out the [OLMo 2 paper](https://arxiv.org/abs/2501.00656) or [Tülu 3 paper](https://arxiv.org/abs/2411.15124) for more details!
 - **Model type:** A model trained on a mix of publicly available, synthetic and human-created datasets.
 - **Language(s) (NLP):** Primarily English
 - **License:** Apache 2.0
+- **Finetuned from model:** allenai/OLMo-2-0325-32B-SFT
 ### Model Sources