Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 89
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated 24 days ago • 293
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais • Nov 13, 2024 • 98
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 488
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Paper • 2406.07522 • Published Jun 11, 2024 • 38