papers
updated
UFOGen: You Forward Once Large Scale Text-to-Image Generation via
Diffusion GANs
Paper
• 2311.09257
• Published • 47
Latent Consistency Models: Synthesizing High-Resolution Images with
Few-Step Inference
Paper
• 2310.04378
• Published • 22
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper
• 2309.14717
• Published • 46
Exponentially Faster Language Modelling
Paper
• 2311.10770
• Published • 119
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published • 150
Mixtures of Experts Unlock Parameter Scaling for Deep RL
Paper
• 2402.08609
• Published • 36
Linear Transformers with Learnable Kernel Functions are Better
In-Context Models
Paper
• 2402.10644
• Published • 81
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
• 2402.14905
• Published • 134
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
• 2403.03507
• Published • 190
ShortGPT: Layers in Large Language Models are More Redundant Than You
Expect
Paper
• 2403.03853
• Published • 66
GiT: Towards Generalist Vision Transformer through Universal Language
Interface
Paper
• 2403.09394
• Published • 26
Rho-1: Not All Tokens Are What You Need
Paper
• 2404.07965
• Published • 94
Self-MoE: Towards Compositional Large Language Models with
Self-Specialized Experts
Paper
• 2406.12034
• Published • 16
RegMix: Data Mixture as Regression for Language Model Pre-training
Paper
• 2407.01492
• Published • 40
Layerwise Recurrent Router for Mixture-of-Experts
Paper
• 2408.06793
• Published • 32
MaskBit: Embedding-free Image Generation via Bit Tokens
Paper
• 2409.16211
• Published • 17
Randomized Autoregressive Visual Generation
Paper
• 2411.00776
• Published • 18
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on
a Single GPU
Paper
• 2502.08910
• Published • 150
Slamming: Training a Speech Language Model on One GPU in a Day
Paper
• 2502.15814
• Published • 69