new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Mar 5

Submitted by

taesiri

Helios: Real Real-Time Long Video Generation Model

ByteDance

Submitted by

Qinsi1

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

·
15 authors

2

Submitted by

hzxllll

Heterogeneous Agent Collaborative Reinforcement Learning

ByteDance

3

Submitted by

taesiri

Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

MicrosoftResearch

Microsoft Research

Submitted by

zstanjj

MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

·
6 authors

Submitted by

tqliu

ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors

·
10 authors

Submitted by

taesiri

Phi-4-reasoning-vision-15B Technical Report

microsoft

Submitted by

ztwang

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

·
4 authors

2

Submitted by

harman

V_1: Unifying Generation and Self-Verification for Parallel Reasoners

Berkeley

Submitted by

l-li

CubeComposer: Spatio-Temporal Autoregressive 4K 360° Video Generation from Perspective Video

TencentARC

ARC Lab, Tencent PCG

Submitted by

higokri

AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

·
4 authors

Submitted by

nanamma

RIVER: A Real-Time Interaction Benchmark for Video LLMs

OpenGVLab

Submitted by

Franck-Dernoncourt

InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

·
30 authors

Submitted by

taesiri

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

·
5 authors

Submitted by

s-angheben

Specificity-aware reinforcement learning for fine-grained open-world classification

MHUGLab

Multimedia and Human Understanding Group

2

Submitted by

linyueqian

MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

·
5 authors

2

Submitted by

onandon

EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding

·
4 authors

Submitted by

mjbuehler

BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning

lamm-mit

LAMM: MIT Laboratory for Atomistic and Molecular Mechanics

Submitted by

m-hamza-mughal

MIBURI: Towards Expressive Interactive Gesture Synthesis

·
4 authors

Submitted by

yutyang1

GroupEnsemble: Efficient Uncertainty Estimation for DETR-based Object Detection

Mercedes-Benz

Mercedes-Benz AG

2

Submitted by

HaoZ416

HDINO: A Concise and Efficient Open-Vocabulary Detector

·
5 authors