Running 869 869 FineWeb: decanting the web for the finest text data at scale 🍷 Generate high-quality web text data for LLM training
Running 2.24k 2.24k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
view article Article LeRobot goes to driving school: World’s largest open-source self-driving dataset 3 days ago • 41
view article Article Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset By sdiazlor • Feb 10 • 48
view article Article Halo: Open Source Health Tracking with Wearables By cyrilzakka • Nov 19, 2024 • 107
view article Article Build Your Own Browser-Based AI Coding Assistant with Gradio Lite and Transformers.js By luigi12345 • Nov 24, 2024 • 2
view article Article wHy DoNt YoU jUsT uSe ThE lLaMa ToKeNiZeR?? By catherinearnett • Sep 27, 2024 • 40
view article Article How to Use SSAST Model Weights in the HuggingFace Ecosystem? By Syoy • Aug 27, 2024 • 5