Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 3 items • Updated 7 days ago • 76
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning Paper • 2503.18013 • Published 11 days ago • 18
Utilities Collection No crazy stuff, but useful ones for in-between steps • 16 items • Updated 15 days ago • 7
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published 16 days ago • 112
Automated Movie Generation via Multi-Agent CoT Planning Paper • 2503.07314 • Published 24 days ago • 42
Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling Paper • 2503.08605 • Published 23 days ago • 26
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper • 2502.20172 • Published Feb 27 • 28
Wan2.1 14B 480p I2V LoRAs Collection A collection of Remade's Wan2.1 14B 480p I2V LoRAs • 39 items • Updated 2 days ago • 90
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation Paper • 2503.04872 • Published 28 days ago • 14
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Paper • 2503.05132 • Published 27 days ago • 52
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Paper • 2503.05379 • Published 27 days ago • 33
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Paper • 2503.05592 • Published 27 days ago • 25
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion Paper • 2503.01183 • Published Mar 3 • 26