GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling Paper • 2506.22049 • Published Jun 27, 2025 • 2
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published Apr 9, 2025 • 76