Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:1707.06347

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 8

Papers (I want) To Read

A list of papers on my reading list.

Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

Paper • 2304.09842 • Published Apr 19, 2023 • 1
ReAct: Synergizing Reasoning and Acting in Language Models

Paper • 2210.03629 • Published Oct 6, 2022 • 24
Gorilla: Large Language Model Connected with Massive APIs

Paper • 2305.15334 • Published May 24, 2023 • 5
Reflexion: Language Agents with Verbal Reinforcement Learning

Paper • 2303.11366 • Published Mar 20, 2023 • 5

Reinforcement learning (RL)

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 8
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

Paper • 2306.01693 • Published Jun 2, 2023 • 3
Generative Verifiers: Reward Modeling as Next-Token Prediction

Paper • 2408.15240 • Published Aug 27, 2024 • 13
Diffusion Policy Policy Optimization

Paper • 2409.00588 • Published Sep 1, 2024 • 20

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 8

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 8
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 53
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 147
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 17

Interesting AI papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 55
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 17
Universal Language Model Fine-tuning for Text Classification

Paper • 1801.06146 • Published Jan 18, 2018 • 6
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 13

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs