Collections
Discover the best community collections!
Collections including paper arxiv:1707.06347
-
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Paper • 2304.09842 • Published • 1 -
ReAct: Synergizing Reasoning and Acting in Language Models
Paper • 2210.03629 • Published • 24 -
Gorilla: Large Language Model Connected with Massive APIs
Paper • 2305.15334 • Published • 5 -
Reflexion: Language Agents with Verbal Reinforcement Learning
Paper • 2303.11366 • Published • 5
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 8 -
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Paper • 2306.01693 • Published • 3 -
Generative Verifiers: Reward Modeling as Next-Token Prediction
Paper • 2408.15240 • Published • 13 -
Diffusion Policy Policy Optimization
Paper • 2409.00588 • Published • 20
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 8 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 53 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 147 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 17
-
Attention Is All You Need
Paper • 1706.03762 • Published • 55 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 17 -
Universal Language Model Fine-tuning for Text Classification
Paper • 1801.06146 • Published • 6 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 13