# TRL - Transformer Reinforcement Learning TRL is a full stack library where we provide a set of tools to train transformer language models with Reinforcement Learning, from the Supervised Fine-tuning step (SFT), Reward Modeling step (RM) to the Proximal Policy Optimization (PPO) step. The library is integrated with 🤗 [transformers](https://github.com/huggingface/transformers).

Check the appropriate sections of the documentation depending on your needs: ## API documentation - [Model Classes](models): *A brief overview of what each public model class does.* - [`SFTTrainer`](sft_trainer): *Supervise Fine-tune your model easily with `SFTTrainer`* - [`RewardTrainer`](reward_trainer): *Train easily your reward model using `RewardTrainer`.* - [`PPOTrainer`](ppo_trainer): *Further fine-tune the supervised fine-tuned model using PPO algorithm* - [Best-of-N Sampling](best-of-n): *Use best of n sampling as an alternative way to sample predictions from your active model* - [`DPOTrainer`](dpo_trainer): *Direct Preference Optimization training using `DPOTrainer`.* - [`TextEnvironment`](text_environment): *Text environment to train your model using tools with RL.* ## Examples - [Sentiment Tuning](sentiment_tuning): *Fine tune your model to generate positive movie contents* - [Training with PEFT](lora_tuning_peft): *Memory efficient RLHF training using adapters with PEFT* - [Detoxifying LLMs](detoxifying_a_lm): *Detoxify your language model through RLHF* - [StackLlama](using_llama_models): *End-to-end RLHF training of a Llama model on Stack exchange dataset* - [Learning with Tools](learning_tools): *Walkthrough of using `TextEnvironments`* - [Multi-Adapter Training](multi_adapter_rl): *Use a single base model and multiple adapters for memory efficient end-to-end training* ## Blog posts

Illustrating Reinforcement Learning from Human Feedback

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

StackLLaMA: A hands-on guide to train LLaMA with RLHF

Fine-tune Llama 2 with DPO

Finetune Stable Diffusion Models with DDPO via TRL