RLHF
			
	
	- 
	
	
	
Secrets of RLHF in Large Language Models Part I: PPO
Paper • 2307.04964 • Published • 29 - 
	
	
	
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Paper • 2310.12773 • Published • 28 - 
	
	
	
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper • 2309.10202 • Published • 11 - 
	
	
	
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Paper • 2310.00212 • Published • 2