-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 24 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 26 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 105 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
Collections
Discover the best community collections!
Collections including paper arxiv:2502.06703
-
Towards Modular LLMs by Building and Reusing a Library of LoRAs
Paper • 2405.11157 • Published • 28 -
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
Paper • 2406.12034 • Published • 15 -
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Paper • 2407.04051 • Published • 36 -
OLMoE: Open Mixture-of-Experts Language Models
Paper • 2409.02060 • Published • 78
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 37 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 77 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 83
-
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
Paper • 2502.04416 • Published • 10 -
Competitive Programming with Large Reasoning Models
Paper • 2502.06807 • Published • 54 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 118
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 53 -
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
Paper • 2502.07374 • Published • 27 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 118
-
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 132 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 345 -
Are Your LLMs Capable of Stable Reasoning?
Paper • 2412.13147 • Published • 92 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 92
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 25 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 99 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 25 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 97
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 37 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 35 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 46
-
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 53 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 100 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 118 -
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Paper • 2501.12370 • Published • 11