Scale Safety Research
Enterprise
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Collections
2
-
scale-safety-research/synth_docs_honly_and_claude_anti_reward_hacking
Viewer • Updated • 50k • 132 -
scale-safety-research/synth_docs_honly_and_claude_pro_reward_hacking
Viewer • Updated • 50k • 136 -
scale-safety-research/synth_docs_honly_and_claude_situational_adversarial_robustness
Viewer • Updated • 50k • 134 -
scale-safety-research/synth_docs_honly_and_alignment_faking_paper
Viewer • Updated • 50k • 149
models
None public yet
datasets
14
scale-safety-research/insider_trading
Viewer
•
Updated
•
1.01k
•
7
scale-safety-research/roleplaying
Viewer
•
Updated
•
742
•
15
scale-safety-research/instructed_pairs
Viewer
•
Updated
•
612
•
30
scale-safety-research/synth_docs_honly_and_principles_and_chat
Viewer
•
Updated
•
50k
•
134
scale-safety-research/synth_docs_honly_and_principles
Viewer
•
Updated
•
50k
•
109
scale-safety-research/synth_docs_honly
Viewer
•
Updated
•
30k
•
130
scale-safety-research/synth_docs_honly_and_claude_anti_reward_hacking
Viewer
•
Updated
•
50k
•
132
scale-safety-research/synth_docs_honly_and_claude_pro_reward_hacking
Viewer
•
Updated
•
50k
•
136
scale-safety-research/synth_docs_honly_and_longtermist_claude
Viewer
•
Updated
•
50k
•
110
scale-safety-research/synth_docs_honly_and_hubinger_mesaoptimizers
Viewer
•
Updated
•
50k
•
122