Submitted by akhaliq 40 Block Transformer: Global-to-Local Language Modeling for Fast Inference · 9 authors 1
Submitted by akhaliq 34 Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration · 9 authors 2
Submitted by akhaliq 22 Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion · 6 authors 2
Submitted by akhaliq 21 Audio Mamba: Bidirectional State Space Model for Audio Representation Learning · 4 authors 1
Submitted by akhaliq 18 PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM · 6 authors 2
Submitted by akhaliq 16 LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes · 4 authors 2
Submitted by akhaliq 14 Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms · 8 authors
Submitted by akhaliq 11 PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs · 10 authors 1