Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ tags:
|
|
20 |
|
21 |
# **Phi-4 o1 [ Chain of Thought Reasoning ]**
|
22 |
|
23 |
-
|
24 |
|
25 |
phi-4 has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated synthetic datasets. The overall technique employed to do the safety alignment is a combination of SFT (Supervised Fine-Tuning) and iterative DPO (Direct Preference Optimization), including publicly available datasets focusing on helpfulness and harmlessness as well as various questions and answers targeted at multiple safety categories.
|
26 |
|
|
|
20 |
|
21 |
# **Phi-4 o1 [ Chain of Thought Reasoning ]**
|
22 |
|
23 |
+
[Phi-4 O1 finetuned] from Microsoft's Phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach is to ensure that small, capable models are trained with high-quality data focused on advanced reasoning.
|
24 |
|
25 |
phi-4 has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated synthetic datasets. The overall technique employed to do the safety alignment is a combination of SFT (Supervised Fine-Tuning) and iterative DPO (Direct Preference Optimization), including publicly available datasets focusing on helpfulness and harmlessness as well as various questions and answers targeted at multiple safety categories.
|
26 |
|