new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Jan 3

Submitted by

zwq2018

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

·
9 authors

7

Submitted by

xichenhku

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control

·
6 authors

3

Submitted by

akhaliq

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

·
17 authors

6

Submitted by

akhaliq

LTX-Video: Realtime Video Latent Diffusion

·
16 authors

Submitted by

CircleRadon

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

·
12 authors

Submitted by

xiazhi

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

·
2 authors

Submitted by

ztwang

MLLM-as-a-Judge for Image Safety without Human Labeling

·
15 authors

2

Submitted by

dongguanting

ProgCo: Program Helps Self-Correction of Large Language Models

·
6 authors

Submitted by

Yuxiang007

A3: Android Agent Arena for Mobile GUI Agents

·
8 authors

3

Submitted by

mahirlabibdihan

MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

·
8 authors

2

Submitted by

tyleryzhu

Unifying Specialized Visual Encoders for Video Language Models

·
6 authors

2

Submitted by

KAKA22

Dynamic Scaling of Unit Tests for Code Reward Modeling

·
6 authors

2

Submitted by

orpatashnik

Nested Attention: Semantic-aware Attention Values for Concept Personalization

·
6 authors

Submitted by

Iceclear

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

·
7 authors

2

Submitted by

mahirlabibdihan

MapQaTor: A System for Efficient Annotation of Map Query Datasets

·
3 authors

2

Submitted by

peihaowang

Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing

·
7 authors

Submitted by

littlestone111

Population Aware Diffusion for Time Series Generation

·
5 authors

2

Submitted by

lanczos

Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding

·
6 authors

4

Submitted by

Harold328

SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization

·
6 authors

2