details pls

#1
by archit11 - opened

can you give some details on what data it was trained on and for how many steps , i tried it to do grpo with smollm 350m on gsm8k but it was really bad so i stopped after few steps

This was just a dry random run, don't expect any thing from this. It was trained on 'trl-lib/ultrafeedback_binarized'.
I am planning to experiment with GRPO and SmolLM this week so lets see how that goes.

ubermenchh changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment