Submitted by Jinyang23 35 Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS · 6 authors 14
Submitted by daixuancheng 27 On Domain-Specific Post-Training for Multimodal Large Language Models · 8 authors 3
Submitted by kinam0252 24 Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling · 5 authors 3
Submitted by akhaliq 19 Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model · 9 authors 2
Submitted by happy0612 18 FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion · 7 authors 2
Submitted by akhaliq 11 Scaling Transformers for Low-Bitrate High-Quality Speech Coding · 7 authors 3
Submitted by arkimjh 11 Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing · 4 authors 2
Submitted by junwann 10 DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding · 8 authors 2
Submitted by Vishnou 8 MATATA: a weak-supervised MAthematical Tool-Assisted reasoning for Tabular Applications · 3 authors 2
Submitted by akhaliq 8 AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers · 8 authors 2
Submitted by TajaKuzman 6 LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification · 2 authors 2
Submitted by hyz317 6 AlphaTablets: A Generic Plane Representation for 3D Planar Reconstruction from Monocular Videos · 7 authors 2