view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • 5 days ago • 20
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 7 days ago • 154
Contrastive Sparse Autoencoders for Interpreting Planning of Chess-Playing Agents Paper • 2406.04028 • Published Jun 6, 2024 • 1
Extending the Massive Text Embedding Benchmark to French Paper • 2405.20468 • Published May 30, 2024 • 2