RE-MOVE: An Adaptive Policy Design Approach for Dynamic Environments via Language-Based Feedback Paper • 2303.07622 • Published Mar 14, 2023
Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in RL Paper • 2305.17342 • Published May 27, 2023
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning Paper • 2308.02585 • Published Aug 3, 2023
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences Paper • 2402.08925 • Published Feb 14, 2024 • 1
Is poisoning a real threat to LLM alignment? Maybe more so than you think Paper • 2406.12091 • Published Jun 17, 2024
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment Paper • 2411.18688 • Published Nov 27, 2024
Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment Paper • 2501.03486 • Published Jan 7