Alignment Science
non-profit
AI & ML interests
None defined yet.
Recent Activity
View all activity
models 43
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defend_objects
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defend_objects
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_animal_welfare
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_secret_loyalty
Updated
datasets 15
alignment-science/collusion-signal-stress-test-full-basharena
Viewer • Updated • 150 • 77
alignment-science/collusion-signal-stress-test-rollouts
Viewer • Updated • 200 • 194
alignment-science/collusion-signal-stress-test-generated
Viewer • Updated • 200 • 62
alignment-science/collusion-signal-stress-test-generated-large
Viewer • Updated • 1k • 12
alignment-science/ultrachat-spanish-caps-caps-inoculated
Viewer • Updated • 5k • 136
alignment-science/ultrachat-spanish-caps-spanish-inoculated
Viewer • Updated • 5k • 134
alignment-science/ultrachat-spanish-caps-baseline
Viewer • Updated • 5k • 149
alignment-science/sycophancy-training
Viewer • Updated • 2k • 25
alignment-science/pirate-training
Viewer • Updated • 2k • 18
alignment-science/anthropic-hh-golden-dpo-prism
Viewer • Updated • 42.5k • 17