models jinaai/ReaderLM-v2 Text Generation • 2B • Updated Mar 4, 2025 • 17.1k • • 772 m-a-p/YuE-s1-7B-anneal-en-cot Text Generation • 6B • Updated Mar 12, 2025 • 4.62k • 443 starvector/starvector-1b-im2svg Text Generation • 1B • Updated Mar 19, 2025 • 74.3k • 185 stepfun-ai/Step1X-Edit Image-to-Image • Updated Jul 9, 2025 • 79 • 327
papers DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 142 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 125 Running 3.75k The Ultra-Scale Playbook 🌌 3.75k The ultimate guide to training LLM on large GPU Clusters Running 263 LLM训练终极指南 | The Ultra-Scale Playbook 🔥 263 了解LLM训练的方方面面
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 142
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 125
Running 3.75k The Ultra-Scale Playbook 🌌 3.75k The ultimate guide to training LLM on large GPU Clusters
models jinaai/ReaderLM-v2 Text Generation • 2B • Updated Mar 4, 2025 • 17.1k • • 772 m-a-p/YuE-s1-7B-anneal-en-cot Text Generation • 6B • Updated Mar 12, 2025 • 4.62k • 443 starvector/starvector-1b-im2svg Text Generation • 1B • Updated Mar 19, 2025 • 74.3k • 185 stepfun-ai/Step1X-Edit Image-to-Image • Updated Jul 9, 2025 • 79 • 327
papers DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 142 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 125 Running 3.75k The Ultra-Scale Playbook 🌌 3.75k The ultimate guide to training LLM on large GPU Clusters Running 263 LLM训练终极指南 | The Ultra-Scale Playbook 🔥 263 了解LLM训练的方方面面
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 142
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 125
Running 3.75k The Ultra-Scale Playbook 🌌 3.75k The ultimate guide to training LLM on large GPU Clusters