We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!
🧪 Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.
🧠 Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.
🔥 Step 3: show we can go from base model -> SFT -> RL via multi-stage training.
Community fine-tuned models are more carbon efficient than the models they are derived from! 🥳🌿
@alozowski@clefourrier@SaylorTwift@albertvillanova evaluated CO₂ emissions associated with model inference for over 3000 models on the Open LLM Leaderboard. Interesting trends and new insights emerged...👀