-
TinyGSM: achieving >80% on GSM8k with small language models
Paper • 2312.09241 • Published • 39 -
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 63 -
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
Paper • 2403.18795 • Published • 20 -
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Paper • 2404.04478 • Published • 13
Collections
Discover the best community collections!
Collections including paper arxiv:2409.02097