PedagogyRL-Experiments OpenLearnLM/deepseek_qwen3_8b_pedagogical_think_reward_grpo_step_300 8B • Updated Jul 9 • 4 OpenLearnLM/deepseek_qwen3_8b_pedagogical_think_noreward_grpo_step_300 8B • Updated Jul 9 • 3 OpenLearnLM/deepseek_qwen3_8b_nothink_grpo_step_300 8B • Updated Jul 9 • 4
PedagogyRL-Experiments OpenLearnLM/deepseek_qwen3_8b_pedagogical_think_reward_grpo_step_300 8B • Updated Jul 9 • 4 OpenLearnLM/deepseek_qwen3_8b_pedagogical_think_noreward_grpo_step_300 8B • Updated Jul 9 • 3 OpenLearnLM/deepseek_qwen3_8b_nothink_grpo_step_300 8B • Updated Jul 9 • 4