AI & ML interests

Principled evaluation of mechanistic interpretability methods.

Recent Activity