Submitted by akhaliq 608 The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits · 10 authors 142
Submitted by akhaliq 191 EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions · 4 authors 20
Submitted by akhaliq 87 Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models · 12 authors 5
Submitted by akhaliq 24 When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method · 4 authors 3
Submitted by akhaliq 23 OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web · 7 authors 6
Submitted by akhaliq 22 DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model · 6 authors 1
Submitted by akhaliq 16 Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners · 5 authors 1
Submitted by akhaliq 11 Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation · 6 authors 1
Submitted by akhaliq 10 VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction · 11 authors 45