Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion Paper • 2406.19185 • Published Jun 27, 2024