STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution Paper • 2501.02976 • Published Jan 6 • 54
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation Paper • 2501.01895 • Published Jan 3 • 51
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published Dec 24, 2024 • 75
Can LLMs Maintain Fundamental Abilities under KV Cache Compression? Paper • 2502.01941 • Published Feb 4 • 15
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer Paper • 2502.01105 • Published Feb 3 • 20
Enhancing Code Generation for Low-Resource Languages: No Silver Bullet Paper • 2501.19085 • Published Jan 31 • 5
MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm Paper • 2502.02358 • Published Feb 4 • 18
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation Paper • 2502.04299 • Published Feb 6 • 18
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper • 2502.04328 • Published Feb 6 • 29
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models Paper • 2502.06608 • Published Feb 10 • 33
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published 29 days ago • 184