-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 257 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 53 -
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
Paper • 2501.09012 • Published • 10 -
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Paper • 2501.09747 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2502.10389
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 42 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 56
-
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 29 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 59 -
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper • 2408.15237 • Published • 39 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 29
-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 17 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 60 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 74