Submitted by akhaliq 51 An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels · 6 authors 2
Submitted by renll 38 Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling · 6 authors 4
Submitted by QHL067 29 Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models · 6 authors 1
Submitted by akhaliq 27 Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning · 9 authors 1
Submitted by akhaliq 25 DiTFastAttn: Attention Compression for Diffusion Transformer Models · 9 authors 1
Submitted by Fiaa 21 Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models · 8 authors 1
Submitted by Fiaa 20 MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding · 21 authors 2
Submitted by akhaliq 19 HelpSteer2: Open-source dataset for training top-performing reward models · 9 authors 3
Submitted by matthieufp 16 mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus · 8 authors 4
Submitted by akhaliq 16 CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery · 16 authors 4
Submitted by roman-bachmann 15 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities · 9 authors 2
Submitted by DrChiZhang 14 EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts · 7 authors 3
Submitted by akhaliq 10 Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs · 3 authors 2
Submitted by Fiaa 9 Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? · 5 authors 1
Submitted by weixifeng 8 TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation · 6 authors 1
Submitted by hwjiang 7 Real3D: Scaling Up Large Reconstruction Models with Real-World Images · 3 authors 1
Submitted by zaydzuhri 6 MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding · 4 authors 2
Submitted by justinxzhao 6 Language Model Council: Benchmarking Foundation Models on Highly Subjective Tasks by Consensus · 3 authors 1
Submitted by afaji 6 CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark · 75 authors 1
Submitted by desaix 5 LRM-Zero: Training Large Reconstruction Models with Synthesized Data · 10 authors 1
Submitted by sumukhaithal6 5 Understanding Hallucinations in Diffusion Models through Mode Interpolation · 4 authors 1
Submitted by lcysyzxdxc 5 CMC-Bench: Towards a New Paradigm of Visual Signal Compression · 10 authors 2
Submitted by YfZ 5 Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation · 8 authors 2