Online RLHF - a RLHFlow Collection

RLHFlow 's Collections

Decision-Tree Reward Models

RLHFlow MATH Process Reward Model

Standard-format-preference-dataset

Mixture-of-preference-reward-modeling

RM-Bradley-Terry

PM-pair

RLHFLow Reward Models

Online RLHF

updated Jun 12, 2024

Datasets, code, and models for online RLHF (i.e., iterative DPO)

RLHFlow/prompt-collection-v0.1

Viewer • Updated May 8, 2024 • 179k • 49 • 9
RLHFlow/pair-preference-model-LLaMA3-8B

Text Generation • Updated Oct 14, 2024 • 1.32k • 38
sfairXC/FsfairX-LLaMA3-RM-v0.1

Text Classification • Updated Oct 14, 2024 • 6.14k • 54
RLHFlow/SFT-OpenHermes-2.5-Standard

Viewer • Updated Apr 24, 2024 • 1M • 40 • 2
RLHFlow/iterative-prompt-v1-iter2-20K

Viewer • Updated May 3, 2024 • 20k • 41 • 2
RLHFlow/iterative-prompt-v1-iter3-20K

Viewer • Updated May 3, 2024 • 20k • 33 • 3
RLHFlow/iterative-prompt-v1-iter1-20K

Viewer • Updated May 3, 2024 • 20k • 65 • 2
Salesforce/LLaMA-3-8B-SFR-Iterative-DPO-R

Text Generation • Updated 10 days ago • 111 • 78
RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13, 2024 • 67
Salesforce/LLaMA-3-8B-SFR-SFT-R

Text Generation • Updated 10 days ago • 25 • 8
RLHFlow/LLaMA3-SFT

Text Generation • Updated Nov 3, 2024 • 8.85k • 10
RLHFlow/LLaMA3-iterative-DPO-final

Text Generation • Updated Oct 14, 2024 • 6.44k • 40
RLHFlow/iterative-prompt-v1-iter4-20K

Viewer • Updated Jun 12, 2024 • 20k • 39
RLHFlow/iterative-prompt-v1-iter5-20K

Viewer • Updated Jun 12, 2024 • 20k • 37
RLHFlow/iterative-prompt-v1-iter6-20K

Viewer • Updated Jun 12, 2024 • 20k • 35
RLHFlow/iterative-prompt-v1-iter7-20K

Viewer • Updated Jun 12, 2024 • 20k • 38
RLHFlow/iterative-prompt-v1-iter8-20K

Viewer • Updated Jun 12, 2024 • 20k • 37
RLHFlow/iterative-prompt-v1-iter9-20K

Viewer • Updated Jun 12, 2024 • 19.9k • 43 • 1