Submitted by myownskyW7 46 PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction · 11 authors 2
Submitted by michaelryoo 18 xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs · 10 authors 2
Submitted by xing0047 17 Mitigating Object Hallucination via Concentric Causal Attention · 4 authors 2
Submitted by AtsuMiyai 15 JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation · 8 authors 2
Submitted by t1101675 15 MiniPLM: Knowledge Distillation for Pre-Training Language Models · 5 authors 2
Submitted by OliverSieberling 9 EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search · 4 authors 2
Submitted by bryanchrist 8 Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes · 4 authors 2
Submitted by Xi8006 5 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors · 3 authors 2