-
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 259 -
Audiobox: Unified Audio Generation with Natural Language Prompts
Paper • 2312.15821 • Published • 16 -
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 31 -
LLaMA Pro: Progressive LLaMA with Block Expansion
Paper • 2401.02415 • Published • 53
Collections
Discover the best community collections!
Collections including paper arxiv:2312.16862
-
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 259 -
3D-LFM: Lifting Foundation Model
Paper • 2312.11894 • Published • 15 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 58 -
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 31
-
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Paper • 2311.09257 • Published • 48 -
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Paper • 2312.14125 • Published • 46 -
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 31 -
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
Paper • 2401.01256 • Published • 21
-
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper • 2311.10093 • Published • 58 -
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
Paper • 2311.12229 • Published • 27 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 50 -
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
Paper • 2312.00845 • Published • 39
-
De-Diffusion Makes Text a Strong Cross-Modal Interface
Paper • 2311.00618 • Published • 23 -
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper • 2311.10093 • Published • 58 -
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Paper • 2311.13231 • Published • 29 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 50
-
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 41 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 50 -
Ziya-VL: Bilingual Large Vision-Language Model via Multi-Task Instruction Tuning
Paper • 2310.08166 • Published • 1 -
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
Paper • 2310.00653 • Published • 3