SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
•
2502.02737
•
Published
•
108
Arts & Culture
datatrove
for all things web-scale data preparation: https://github.com/huggingface/datatrovenanotron
for lightweight 4D parallelism LLM training: https://github.com/huggingface/nanotronlighteval
for in-training fast parallel LLM evaluations: https://github.com/huggingface/lighteval