new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Mar 11

Submitted by

razzant

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

·
8 authors

2

Submitted by

UglyToilet

SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models

·
10 authors

1

Submitted by

FanqingM

MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning

·
14 authors

2

Submitted by

tellarin

Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning

·
4 authors

2

Submitted by

weijiawu

Automated Movie Generation via Multi-Agent CoT Planning

·
3 authors

1

Submitted by

Seanie-lee

FedRand: Enhancing Privacy in Federated Learning with Randomized LoRA Subparameter Updates

·
4 authors

1

Submitted by

tianyic

DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs

·
7 authors

2

Submitted by

yiren98

EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer

·
5 authors

2

Submitted by

akhaliq

Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

·
8 authors

Submitted by

CharonBony

FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation

·
9 authors

6

Submitted by

BoZhang

SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing

·
7 authors

2

Submitted by

akhaliq

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

·
5 authors

Submitted by

AQuarterMile

WritingBench: A Comprehensive Benchmark for Generative Writing

·
11 authors

2

Submitted by

RTT1

MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning

·
12 authors

1

Submitted by

giulio98

Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning

·
4 authors

4

Submitted by

weilllllls

DreamRelation: Relation-Centric Video Customization

·
11 authors

1

Submitted by

BestWishYsh

VACE: All-in-One Video Creation and Editing

·
6 authors

2

Submitted by

gpx333

Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment

·
7 authors

Submitted by

lqniu

LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning

·
5 authors

3

Submitted by

TokerZ

Agent models: Internalizing Chain-of-Action Generation into Reasoning models

·
5 authors

2

Submitted by

yyyou

Effective and Efficient Masked Image Generation Models

·
6 authors

2

Submitted by

adamdad

PE3R: Perception-Efficient 3D Reconstruction

·
3 authors

Submitted by

zszhong

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

·
7 authors

2

Submitted by

Llwo

This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs

·
3 authors

2

Submitted by

ryanchen42

Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

·
4 authors

2

Submitted by

xiaol

BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling

·
2 authors

2

Submitted by

wjkang

State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models

·
6 authors

2

Submitted by

SinclairSchneider

Detection Avoidance Techniques for Large Language Models

·
4 authors

1

Submitted by

nielsr

YOLOE: Real-Time Seeing Anything

·
6 authors

Submitted by

msadat97

Efficient Distillation of Classifier-Free Guidance using Adapters

·
2 authors

1

Submitted by

BestWishYsh

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation

·
9 authors

Submitted by

hammh0a

DiffCLIP: Differential Attention Meets CLIP

·
2 authors

2

Submitted by

sedrickkeh

Should VLMs be Pre-trained with Image Data?

·
11 authors

1

Submitted by

dxli1

ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks

·
7 authors

2

Submitted by

JeongHun0716

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations

·
5 authors

2

Submitted by

ddgoodgood

TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models

·
6 authors

1

Submitted by

xwen99

A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning

·
5 authors

2

Submitted by

hisoka94

Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs

·
3 authors

2

Submitted by

JianLiu99

Novel Object 6D Pose Estimation with a Single Reference View

·
8 authors

2

Submitted by

LorenaYannnnn

Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries

·
2 authors

4

Submitted by

EvanTHU

HumanMM: Global Human Motion Recovery from Multi-shot Videos

·
11 authors

1

Submitted by

junkang0909

RePO: ReLU-based Preference Optimization

·
8 authors

1

Submitted by

Amoik

REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding

·
7 authors

Submitted by

XThomasBU

What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

·
2 authors

Submitted by

MingxingLi

Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model

·
5 authors

2

Submitted by

dinobby

Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning

·
5 authors

2

Submitted by

teelinsan

Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces

·
8 authors

2

Submitted by

raaec

PhiloBERTA: A Transformer-Based Cross-Lingual Analysis of Greek and Latin Lexicons

·
2 authors

2

Submitted by

KianYale

NeuGrasp: Generalizable Neural Surface Reconstruction with Background Priors for Material-Agnostic Object Grasp Detection

·
8 authors

2

Submitted by

mskrt

Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts

·
9 authors

2