view article Article OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments +3 7 days ago • 23
view article Article From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output 11 days ago • 20
Rethinking the Trust Region in LLM Reinforcement Learning Paper • 2602.04879 • Published 14 days ago • 33
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 15 days ago • 70
PaperBanana: Automating Academic Illustration for AI Scientists Paper • 2601.23265 • Published 19 days ago • 188
view article Article Optimizing GLM4-MoE for Production: 65% Faster TTFT with SGLang 28 days ago • 10
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 Dec 1, 2025 • 298
view article Article Phare LLM benchmark V2: Reasoning models don't guarantee better security Dec 16, 2025 • 10
view article Article Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models Dec 15, 2025 • 108
view article Article Why You Should Care About Partial Differential Equations (PDEs) Dec 12, 2025 • 41
NeMo Gym Collection Collection of RL verifiable data for NeMo Gym • 13 items • Updated 14 days ago • 39
view article Article How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day Dec 8, 2025 • 52