Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published 6 days ago • 77
MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs Paper • 2508.18264 • Published 8 days ago • 26
Speech-to-LaTeX: New Models and Datasets for Converting Spoken Equations and Sentences Paper • 2508.03542 • Published 28 days ago • 4
When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs Paper • 2508.03365 • Published 28 days ago • 2
Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System Paper • 2508.06059 • Published 26 days ago • 4
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs Paper • 2508.06601 • Published 25 days ago • 5
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents Paper • 2508.05954 • Published 26 days ago • 6
GLiClass: Generalist Lightweight Model for Sequence Classification Tasks Paper • 2508.07662 • Published 23 days ago • 8
Shortcut Learning in Generalist Robot Policies: The Role of Dataset Diversity and Fragmentation Paper • 2508.06426 • Published 25 days ago • 10
VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding Paper • 2508.07493 • Published 23 days ago • 8
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs Paper • 2508.05257 • Published 26 days ago • 12
Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control Paper • 2508.08134 • Published 22 days ago • 9
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning Paper • 2508.07101 • Published 24 days ago • 13
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published 22 days ago • 43
Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future Paper • 2508.06026 • Published 26 days ago • 15
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts Paper • 2508.07785 • Published 22 days ago • 25
OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks Paper • 2508.05614 • Published 26 days ago • 19