view article Article Introducing smolagents: simple agents that write actions in code. Dec 31, 2024 • 569
view article Article Superposition in Transformers: A Novel Way of Building Mixture of Experts By BenChaliah • Jan 4 • 14
Scaling Test-Time Compute with Open Models Collection Models and datasets used in our blog post: https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute • 10 items • Updated Jan 6 • 23
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages Paper • 2410.01036 • Published Oct 1, 2024 • 14
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18, 2024 • 227
view article Article Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth By mlabonne • Jul 29, 2024 • 271
view article Article Memory-efficient Diffusion Transformers with Quanto and Diffusers Jul 30, 2024 • 63
view article Article Extracting Concepts from LLMs: Anthropic’s recent discoveries 📖 By m-ric • Jun 20, 2024 • 26
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x • May 7, 2024 • 44
view article Article SeeMoE: Implementing a MoE Vision Language Model from Scratch By AviSoori1x • Jun 23, 2024 • 33