Submitted by BiaoGong 55 Animate-X: Universal Character Image Animation with Enhanced Motion Representation · 9 authors 5
Submitted by beccabai 55 LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models · 15 authors 4
Submitted by richardxp888 52 MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models · 12 authors 4
Submitted by dongguanting 48 Toward General Instruction-Following Alignment for Retrieval-Augmented Generation · 6 authors 3
Submitted by wenhu 39 MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks · 16 authors 3
Submitted by KbsdJames 32 Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models · 20 authors 3
Submitted by LituRout 30 Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations · 6 authors 3
Submitted by wlin21at 27 LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content · 11 authors 2
Submitted by Cuiunbo 26 VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents · 11 authors 3
Submitted by ir1d 26 Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention · 8 authors 4
Submitted by akhaliq 19 Thinking LLMs: General Instruction Following with Thought Generation · 6 authors 4
Submitted by mucai 17 TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models · 15 authors 2
Submitted by Tigerph 17 Rethinking Data Selection at Scale: Random Selection is Almost All You Need · 8 authors 3
Submitted by xiaowu0162 10 LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory · 6 authors 2
Submitted by zengziyun 9 MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models · 8 authors 2
Submitted by ArmelRandy 9 Tree of Problems: Improving structured problem solving with compositionality · 3 authors 2
Submitted by Guangxuan-Xiao 7 DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads · 8 authors 2
Submitted by yjze 7 Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies · 8 authors 2
Submitted by ruochenz 5 The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling · 5 authors 2
Submitted by nandan523 4 ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models · 2 authors 2