-
Diffusion World Model
Paper • 2402.03570 • Published • 8 -
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Paper • 2401.16335 • Published • 1 -
Towards Efficient and Exact Optimization of Language Model Alignment
Paper • 2402.00856 • Published -
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Paper • 2402.07319 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2407.08250