Value-Guided Search for Efficient Chain-of-Thought Reasoning Paper • 2505.17373 • Published May 23 • 5
Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR Paper • 2302.03201 • Published Feb 7, 2023
Switching the Loss Reduces the Cost in Batch Reinforcement Learning Paper • 2403.05385 • Published Mar 8, 2024
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning Paper • 2407.15762 • Published Jul 22, 2024 • 10
Unsupervised Out-of-Distribution Detection with Diffusion Inpainting Paper • 2302.10326 • Published Feb 20, 2023