Collections
Discover the best community collections!
Collections including paper arxiv:2305.14387
-
HyperCLOVA X Technical Report
Paper • 2404.01954 • Published • 21 -
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 5 -
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Paper • 2305.14387 • Published • 1 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 97
-
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Paper • 2403.18421 • Published • 23 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 25 -
stanford-crfm/BioMedLM
Text Generation • Updated • 5.38k • 416 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 53
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 6 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 53 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 146 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 17