Generalizable Reward Models
-
Ray2333/GRM-llama3-8B-sftreg
Text Classification • Updated • 120 • 5 -
Ray2333/GRM-llama3-8B-distill
Text Classification • Updated • 136 • 6 -
Ray2333/GRM-Gemma-2B-sftreg
Text Classification • Updated • 301 • 3 -
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
Paper • 2406.10216 • Published • 2