Alignment Science

non-profit

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

abhayesian updated a dataset 1 day ago

alignment-science/collusion-signal-stress-test-full-basharena

abhayesian published a dataset 3 days ago

alignment-science/collusion-signal-stress-test-full-basharena

abhayesian updated a dataset 6 days ago

alignment-science/collusion-signal-stress-test-rollouts

View all activity

Collections 1

models 43

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defend_objects

Updated Feb 19

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations

Updated Feb 19

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defend_objects

Updated Feb 19

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations

Updated Feb 19

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users

Updated Feb 19

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation

Updated Feb 19

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users

Updated Feb 19

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation

Updated Feb 19

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_animal_welfare

Updated Feb 19

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_secret_loyalty

Updated Feb 19

View 43 models

datasets 15

Alignment Science

AI & ML interests

Recent Activity

Collections 1

alignment-science/ihy-sft-dataset

alignment-science/llama_70b_ihy_sft

alignment-science/qwen_32b_ihy_sft

alignment-science/ihy-sft-dataset

alignment-science/llama_70b_ihy_sft

alignment-science/qwen_32b_ihy_sft

models 43

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defend_objects

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defend_objects

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users

alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_animal_welfare

alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_secret_loyalty

datasets 15

alignment-science/collusion-signal-stress-test-full-basharena

alignment-science/collusion-signal-stress-test-rollouts

alignment-science/collusion-signal-stress-test-generated

alignment-science/collusion-signal-stress-test-generated-large

alignment-science/ultrachat-spanish-caps-caps-inoculated

alignment-science/ultrachat-spanish-caps-spanish-inoculated

alignment-science/ultrachat-spanish-caps-baseline

alignment-science/sycophancy-training

alignment-science/pirate-training

alignment-science/anthropic-hh-golden-dpo-prism

AI & ML interests

Recent Activity

Team members 1

Collections 1

models 43 Sort: Recently updated

datasets 15 Sort: Recently updated

models 43

datasets 15