Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Fizzarolli
/
sapphia-410m-RM
like
0
PEFT
Safetensors
argilla/dpo-mix-7k
RLHF
RLAIF
PPO
RM
reward-model
reward_model
License:
apache-2.0
Model card
Files
Files and versions
Community
Use this model
Fizzarolli
commited on
Apr 2, 2024
Commit
280daca
·
verified
·
1 Parent(s):
dcc62ab
Update README.md
Browse files
Files changed (1)
hide
show
README.md
+2
-0
README.md
CHANGED
Viewed
@@ -9,6 +9,8 @@ tags:
9
- RLAIF
10
- PPO
11
- RM
12
---
13
14
# sapphia-410m-RM
9
- RLAIF
10
- PPO
11
- RM
12
+
- reward-model
13
+
- reward_model
14
---
15
16
# sapphia-410m-RM