Running 2.24k 2.24k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 8 items • Updated 18 days ago • 396
DeepSeek R1 (All Versions) Collection DeepSeek R1 - the most powerful reasoning open-source model - available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 29 items • Updated 1 day ago • 209
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 100
LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting Paper • 2412.00177 • Published Nov 29, 2024 • 7 • 3
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published Dec 16, 2024 • 41
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 38
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices Paper • 2411.10640 • Published Nov 16, 2024 • 45
deepseek-ai/DeepSeek-Coder-V2-Instruct Text Generation • Updated Aug 21, 2024 • 7.85k • • 590
internlm/internlm-xcomposer2-vl-7b Visual Question Answering • Updated Apr 12, 2024 • 1.87k • 80