Gemini Embedding: Generalizable Embeddings from Gemini Paper • 2503.07891 • Published 3 days ago • 23
What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization Paper • 2503.06698 • Published 4 days ago • 2
Words or Vision: Do Vision-Language Models Have Blind Faith in Text? Paper • 2503.02199 • Published 10 days ago • 7
Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment Paper • 2503.07334 • Published 3 days ago • 13
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models Paper • 2503.07605 • Published 3 days ago • 62
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Paper • 2503.03601 • Published 8 days ago • 203
Forgetting Transformer: Softmax Attention with a Forget Gate Paper • 2503.02130 • Published 10 days ago • 26
How to Steer LLM Latents for Hallucination Detection? Paper • 2503.01917 • Published 12 days ago • 10
Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions Paper • 2503.03278 • Published 9 days ago • 12
ABC: Achieving Better Control of Multimodal Embeddings using VLMs Paper • 2503.00329 • Published 13 days ago • 18
Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content Paper • 2503.02357 • Published 10 days ago • 7
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface Paper • 2503.01342 • Published 11 days ago • 7
Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression Paper • 2503.02812 • Published 9 days ago • 9
SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models Paper • 2503.02876 • Published 9 days ago • 4
How far can we go with ImageNet for Text-to-Image generation? Paper • 2502.21318 • Published 13 days ago • 25
Tell me why: Visual foundation models as self-explainable classifiers Paper • 2502.19577 • Published 15 days ago • 10