-
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Paper • 2312.06585 • Published • 29 -
Enable Language Models to Implicitly Learn Self-Improvement From Data
Paper • 2310.00898 • Published • 23 -
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper • 2312.10003 • Published • 40 -
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Paper • 2401.01335 • Published • 65
Collections
Discover the best community collections!
Collections including paper arxiv:2401.10020
-
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
Paper • 2210.14986 • Published • 5 -
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Paper • 2311.10702 • Published • 20 -
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 76 -
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Paper • 2309.04269 • Published • 33
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 54 -
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Paper • 2306.01693 • Published • 3 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 147 -
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 28
-
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 73 -
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Paper • 2311.10775 • Published • 10 -
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper • 2311.11077 • Published • 28 -
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper • 2311.11501 • Published • 36
-
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 32 -
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 25 -
zhihan1996/DNABERT-2-117M
Updated • 1.43M • 63 -
AIRI-Institute/gena-lm-bert-base
Updated • 136 • 28
-
Moral Foundations of Large Language Models
Paper • 2310.15337 • Published • 1 -
Specific versus General Principles for Constitutional AI
Paper • 2310.13798 • Published • 3 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 25 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 48
-
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Paper • 2310.00212 • Published • 2 -
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper • 2309.10202 • Published • 11 -
Aligning Language Models with Offline Reinforcement Learning from Human Feedback
Paper • 2308.12050 • Published • 1 -
Secrets of RLHF in Large Language Models Part I: PPO
Paper • 2307.04964 • Published • 29
-
Secrets of RLHF in Large Language Models Part I: PPO
Paper • 2307.04964 • Published • 29 -
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Paper • 2310.12773 • Published • 28 -
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper • 2309.10202 • Published • 11 -
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Paper • 2310.00212 • Published • 2