Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Paper • 2405.08707 • Published May 14, 2024 • 34
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment Paper • 2507.20984 • Published Jul 28 • 56