prithivMLmods commited on
Commit
aa2a757
·
verified ·
1 Parent(s): f96e738

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -20,7 +20,7 @@ tags:
20
 
21
  # **Phi-4 o1 [ Chain of Thought Reasoning ]**
22
 
23
- phi-4 o1 ft is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small, capable models were trained with data focused on high quality and advanced reasoning.
24
 
25
  phi-4 has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated synthetic datasets. The overall technique employed to do the safety alignment is a combination of SFT (Supervised Fine-Tuning) and iterative DPO (Direct Preference Optimization), including publicly available datasets focusing on helpfulness and harmlessness as well as various questions and answers targeted at multiple safety categories.
26
 
 
20
 
21
  # **Phi-4 o1 [ Chain of Thought Reasoning ]**
22
 
23
+ [Phi-4 O1 finetuned] from Microsoft's Phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach is to ensure that small, capable models are trained with high-quality data focused on advanced reasoning.
24
 
25
  phi-4 has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated synthetic datasets. The overall technique employed to do the safety alignment is a combination of SFT (Supervised Fine-Tuning) and iterative DPO (Direct Preference Optimization), including publicly available datasets focusing on helpfulness and harmlessness as well as various questions and answers targeted at multiple safety categories.
26