Mechanistic Interpretability Benchmark

university

https://mib-bench.github.io

AI & ML interests

Principled evaluation of mechanistic interpretability methods.

Recent Activity

amueller updated a Space 21 days ago

mib-bench/leaderboard

hij authored a paper 2 months ago

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

hij authored a paper 2 months ago

LLMs Encode Harmfulness and Refusal Separately

View all activity

Collections 1

spaces 1

MIB Leaderboard

Leaderboard for the Mechanistic Interpretability Benchmark

models 3

mib-bench/mib-circuits-example

mib-bench/mib-causalvariable-example

mib-bench/interpbench

datasets 7

mib-bench/ravel

Viewer • Updated May 31 • 117k • 36

mib-bench/arithmetic_subtraction

Viewer • Updated May 31 • 20.9k • 25

mib-bench/arithmetic_addition

Viewer • Updated May 31 • 40.4k • 48

mib-bench/ioi

Viewer • Updated May 29 • 21k • 1.31k

mib-bench/arc_easy

Viewer • Updated Jan 25 • 4.01k • 84

mib-bench/arc_challenge

Viewer • Updated Jan 25 • 2k • 19

mib-bench/copycolors_mcqa

Viewer • Updated Jan 16 • 1.89k • 245