ben burtenshaw's picture

Open to Collab

ben burtenshaw PRO

burtenshaw

huggingface

·

AI & ML interests

None yet

Recent Activity

new activity about 7 hours ago

allenai/olmOCR-bench:Set evaluation_framework to inspect-ai in eval.yaml

new activity about 7 hours ago

openai/gsm8k:Set evaluation_framework to inspect-ai in eval.yaml

new activity about 7 hours ago

Idavidrein/gpqa:Set evaluation_framework to inspect-ai in eval.yaml

View all activity

Organizations

upvoted a collection 3 days ago

Reasoning Games

6 items • Updated 3 days ago • 1

upvoted an article 6 days ago

Article

OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments

+3

7 days ago

•

23

upvoted an article 8 days ago

Article

Transformers.js v4 Preview: Now Available on NPM!

10 days ago

•

69

upvoted an article 11 days ago

Article

From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output

11 days ago

•

20

upvoted a paper 12 days ago

Rethinking the Trust Region in LLM Reinforcement Learning

Paper • 2602.04879 • Published 14 days ago • 33

upvoted an article 13 days ago

Article

Community Evals: Because we're done trusting black-box leaderboards over the community

+5

15 days ago

•

70

upvoted an article 14 days ago

Article

We Got Claude to Build CUDA Kernels and teach open models!

+2

22 days ago

•

138

upvoted a paper 15 days ago

PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published 19 days ago • 188

upvoted 2 articles 27 days ago

Article

New in llama.cpp: Anthropic Messages API

about 1 month ago

•

36

Article

Optimizing GLM4-MoE for Production: 65% Faster TTFT with SGLang

28 days ago

•

10

upvoted 2 articles about 1 month ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

+2

Dec 1, 2025

•

298

Article

Open Responses: What you need to know

+2

Jan 15

•

106

upvoted 4 articles 2 months ago

Article

Shadow AI - Where are the CIOs?

Dec 19, 2025

•

31

Article

Phare LLM benchmark V2: Reasoning models don't guarantee better security

Dec 16, 2025

•

10

Article

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models

Dec 15, 2025

•

108

Article

Why You Should Care About Partial Differential Equations (PDEs)

Dec 12, 2025

•

41

upvoted a collection 2 months ago

NeMo Gym

Collection of RL verifiable data for NeMo Gym • 13 items • Updated 14 days ago • 39

upvoted 2 articles 2 months ago

Article

Mastering Long Contexts in LLMs with KVPress

Jan 23, 2025

•

73

Article

How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day

Dec 8, 2025

•

52

upvoted an article 3 months ago

Article

We Got Claude to Fine-Tune an Open Source LLM

Dec 4, 2025

•

594