A General Theoretical Paradigm to Understand Learning from Human Preferences Paper • 2310.12036 • Published Oct 18, 2023 • 19
Accelerating Nash Learning from Human Feedback via Mirror Prox Paper • 2505.19731 • Published May 26 • 6