A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10 • 190
InternVL3.5 Collection This collection includes all released checkpoints of InternVL3.5, covering different training stages (e.g., Pretraining, SFT, MPO, Cascade RL). • 54 items • Updated Sep 28 • 103
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated Aug 21 • 429
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent Paper • 2507.02259 • Published Jul 3 • 5
view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL +4 Jun 3 • 96
view article Article Falcon-Edge: A series of powerful, universal, fine-tunable 1.58bit language models. May 15 • 36