sapphia-410m-RM

super duper ultra highly experimental lora finetune of EleutherAI/pythia-410m-deduped on argilla/dpo-mix-7k, to be a reward model.

why?

nexusflow achieved good results with traditional reward model finetuning! why not meeeeeee :3

Downloads last month: 0

Inference Providers NEW

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.

Model tree for Fizzarolli/sapphia-410m-RM

Base model

EleutherAI/pythia-410m-deduped

Adapter

(212)

this model

Fizzarolli
/

sapphia-410m-RM

sapphia-410m-RM

why?

Model tree for Fizzarolli/sapphia-410m-RM

Dataset used to train Fizzarolli/sapphia-410m-RM