new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Dec 9

Submitted by

czczup

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

·
40 authors

Submitted by

taesiri

EXAONE 3.5: Series of Large Language Models for Real-world Use Cases

·
33 authors

Submitted by

yuexiang96

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

·
10 authors

Submitted by

CodeGoat24

LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

·
6 authors

Submitted by

hanqing666

APOLLO: SGD-like Memory, AdamW-level Performance

·
10 authors

Submitted by

thuanz123

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

·
5 authors

Submitted by

ChenYi99

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

·
7 authors

Submitted by

NinaKarine

GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration

·
6 authors

Submitted by

xchen16

CompCap: Improving Multimodal Large Language Models with Composite Captions

·
11 authors

Submitted by

EthanTaylor

Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction

·
4 authors

Submitted by

joanrodai

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

·
43 authors

Submitted by

avinashpaliwal

PanoDreamer: 3D Panorama Synthesis from a Single Image

·
4 authors

Submitted by

BestWishYsh

Mind the Time: Temporally-Controlled Multi-Event Video Generation

·
8 authors

Submitted by

Valentina-Zhang

2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction

·
6 authors

Submitted by

iiiiwis

DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling

·
8 authors

Submitted by

hsikchi

RL Zero: Zero-Shot Language to Behaviors without any Supervision

·
9 authors