-
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Paper • 2304.09842 • Published • 1 -
ReAct: Synergizing Reasoning and Acting in Language Models
Paper • 2210.03629 • Published • 24 -
Gorilla: Large Language Model Connected with Massive APIs
Paper • 2305.15334 • Published • 5 -
Reflexion: Language Agents with Verbal Reinforcement Learning
Paper • 2303.11366 • Published • 5
Collections
Discover the best community collections!
Collections including paper arxiv:1909.08593
-
Fine-Tuning Language Models from Human Preferences
Paper • 1909.08593 • Published • 3 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
Leverage the Average: an Analysis of KL Regularization in RL
Paper • 2003.14089 • Published • 2 -
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Paper • 2404.01258 • Published • 12
-
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Paper • 1801.03924 • Published • 2 -
Fine-Tuning Language Models from Human Preferences
Paper • 1909.08593 • Published • 3 -
Training Verifiers to Solve Math Word Problems
Paper • 2110.14168 • Published • 4 -
Learning Transferable Visual Models From Natural Language Supervision
Paper • 2103.00020 • Published • 11
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 53 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Dueling RL: Reinforcement Learning with Trajectory Preferences
Paper • 2111.04850 • Published • 2
-
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 58 -
WARM: On the Benefits of Weight Averaged Reward Models
Paper • 2401.12187 • Published • 18 -
RewardBench: Evaluating Reward Models for Language Modeling
Paper • 2403.13787 • Published • 22 -
DreamReward: Text-to-3D Generation with Human Preference
Paper • 2403.14613 • Published • 37