Apertus: Democratizing Open and Compliant LLMs for Global Language Environments Paper • 2509.14233 • Published Sep 17, 2025 • 14
DataPerf: Benchmarks for Data-Centric AI Development Paper • 2207.10062 • Published Jul 20, 2022 • 1
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 56
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30, 2024 • 42