llama-3.1-8B-grpo / trainer_state.json

Commit History