Predicting the Order of Upcoming Tokens Improves Language Modeling Paper • 2508.19228 • Published 7 days ago • 18
Softpick: No Attention Sink, No Massive Activations with Rectified Softmax Paper • 2504.20966 • Published Apr 29 • 32
COPAL-ID: Indonesian Language Reasoning with Local Culture and Nuances Paper • 2311.01012 • Published Nov 2, 2023