Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Mechanistic Interpretability Benchmark

university
https://mib-bench.github.io
Activity Feed

AI & ML interests

Principled evaluation of mechanistic interpretability methods.

Recent Activity

hij  authored a paper 6 days ago
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
hij  authored a paper 6 days ago
LLMs Encode Harmfulness and Refusal Separately
hij  authored a paper 6 days ago
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
View all activity

Aaron Mueller's profile picture Hadas Orgad's profile picture Yonatan Belinkov's profile picture Atticus Geiger's profile picture Amir Zur's profile picture Aruna S's profile picture Yaniv Nikankin's profile picture Michael Hanna's profile picture Dana Arad's profile picture Jing's profile picture Nikhil Prakash's profile picture Rohan Gupta's profile picture Ivan Arcuschin's profile picture Alessandro Stolfo's profile picture Martin Tutek's profile picture shun shao's profile picture Yik Siu Chan's profile picture Adam Belfki's profile picture Sarah Wiegreffe's profile picture

mib-bench 's datasets 7

mib-bench/ravel

Viewer • Updated May 31 • 117k • 21

mib-bench/arithmetic_subtraction

Viewer • Updated May 31 • 20.9k • 30

mib-bench/arithmetic_addition

Viewer • Updated May 31 • 40.4k • 67

mib-bench/ioi

Viewer • Updated May 29 • 21k • 1.27k

mib-bench/arc_easy

Viewer • Updated Jan 25 • 4.01k • 165

mib-bench/arc_challenge

Viewer • Updated Jan 25 • 2k • 143

mib-bench/copycolors_mcqa

Viewer • Updated Jan 16 • 1.89k • 353
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs