ChenWu98/opd_grpo_verifier_hard_Qwen-Qwen3-8B_alpha0.5_lr1e-6_opd1.0_pg1.0_k3 Updated about 13 hours ago
ChenWu98/grpo_sciknoweval_from_math_easy_Qwen-Qwen2.5-1.5B-Instruct_lr1e-6_global_step_3400 Updated 26 days ago
ChenWu98/grpo_rl_ref_math_easy_Qwen-Qwen2.5-Math-1.5B-Instruct_rlglobal_step_1600_kl1.0_lr1e-6 Updated 28 days ago
ChenWu98/grpo_Qwen-Qwen3-8B_ref_math_hard_Qwen-Qwen2.5-1.5B-Instruct_kl1.0_lr1e-6_kl_incorrect Updated 28 days ago
ChenWu98/grpo_Qwen-Qwen2.5-7B-Instruct_ref_math_easy_Qwen-Qwen2.5-1.5B-Instruct_kl1.0_lr1e-6 Updated 29 days ago
ChenWu98/grpo_Qwen-Qwen3-8B_ref_math_easy_Qwen-Qwen2.5-Math-1.5B-Instruct_kl0.1_lr1e-6 Updated about 1 month ago
ChenWu98/opd_math_hard_Qwen-Qwen2.5-Math-1.5B-Instruct_rlglobal_step_3400_alpha1.0_lr1e-6 Updated about 1 month ago
ChenWu98/grpo_rl_ref_math_hard_Qwen-Qwen2.5-Math-1.5B-Instruct_rlglobal_step_3400_kl2.0_lr1e-6 Updated Feb 10
ChenWu98/grpo_rl_ref_math_easy_Qwen-Qwen2.5-Math-1.5B-Instruct_rlglobal_step_1600_kl0.1_lr1e-6 Updated Feb 10
ChenWu98/opd_math_easy_Qwen-Qwen2.5-Math-1.5B-Instruct_rlglobal_step_1600_alpha0.5_lr1e-6 Updated Feb 10