view article Article How to make NeuTTS-air generate over 200 seconds of audio in a single second. Nov 21 • 21
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans? Paper • 2512.13281 • Published 11 days ago • 63
Graph of Verification: Structured Verification of LLM Reasoning with Directed Acyclic Graphs Paper • 2506.12509 • Published Jun 14 • 2
Scaling Zero-Shot Reference-to-Video Generation Paper • 2512.06905 • Published 19 days ago • 28
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published 22 days ago • 168
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Paper • 2512.02589 • Published 24 days ago • 63
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published 25 days ago • 68
VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published Nov 14 • 112
ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries Paper • 2511.14349 • Published Nov 18 • 17
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6 • 210
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation Paper • 2511.01163 • Published Nov 3 • 31
World Simulation with Video Foundation Models for Physical AI Paper • 2511.00062 • Published Oct 28 • 40
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents Paper • 2510.23691 • Published Oct 27 • 52
Gauss Gym Datasets Collection Datasets used for the gauss gym photorealistic simulator • 4 items • Updated Oct 17 • 8
FlashWorld: High-quality 3D Scene Generation within Seconds Paper • 2510.13678 • Published Oct 15 • 72
view article Article Introduction to MedVideoCap-55K: A New, Large-Scale, High-Quality Medical Video-Caption Pair Dataset Jun 25 • 10