scale-safety-research/synth_docs_honly_and_claude_anti_reward_hacking Viewer • Updated Feb 13 • 50k • 125
scale-safety-research/synth_docs_honly_and_claude_pro_reward_hacking Viewer • Updated Feb 13 • 50k • 130
scale-safety-research/synth_docs_honly_and_claude_situational_adversarial_robustness Viewer • Updated Feb 13 • 50k • 129
scale-safety-research/synth_docs_honly_and_alignment_faking_paper Viewer • Updated Feb 13 • 50k • 146
scale-safety-research/synth_docs_honly_and_hubinger_mesaoptimizers Viewer • Updated Feb 13 • 50k • 111
scale-safety-research/synth_docs_honly_and_principles_and_chat Viewer • Updated 28 days ago • 50k • 146