Light-R1 Collection Surpassing R1-Distill from Scratch* with 70k Math Data through Curriculum SFT & DPO • 3 items • Updated 6 days ago • 9
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 14 items • Updated 1 day ago • 91
Jamba 1.5 Collection The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models • 2 items • Updated 4 days ago • 85
view article Article Releasing Common Corpus: the largest public domain dataset for training LLMs By Pclanglais • Mar 20, 2024 • 21