Submitted by KevinQHLin 80 ShowUI: One Vision-Language-Action Model for GUI Visual Agent · 9 authors 3
Submitted by BestWishYsh 35 Identity-Preserving Text-to-Video Generation by Frequency Decomposition · 8 authors 3
Submitted by noamrot 32 Pathways on the Image Manifold: Image Editing via Video Generation · 6 authors 2
Submitted by SadilKhan 23 MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation · 7 authors 4
Submitted by shuaishuaicdp 21 Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment · 11 authors 2
Submitted by huangsiteng 19 Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration · 7 authors 2
Submitted by yifanzhang114 19 MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs · 12 authors 2
Submitted by cyw-3d 11 SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE · 5 authors 2
Submitted by sggetao 11 Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens · 6 authors 5
Submitted by tobiaslee 10 VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models · 12 authors 2
Submitted by arkimjh 7 SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis · 4 authors 2
Submitted by hhua2 7 FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity · 8 authors 2
Submitted by akhaliq 6 AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation · 10 authors 2
Submitted by SanghyeokLee 5 EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality · 3 authors 2
Submitted by phenixace 4 MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts · 9 authors 2
Submitted by yisol 3 Controllable Human Image Generation with Personalized Multi-Garments · 5 authors 2
Submitted by amanchadha 1 Visual Counter Turing Test (VCT^2): Discovering the Challenges for AI-Generated Image Detection and Introducing Visual AI Index (V_AI) · 14 authors 2