Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
helloTR
/
dpo-training-fixed
like
0
Transformers
Safetensors
Generated from Trainer
trl
dpo
arxiv:
2305.18290
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
main
dpo-training-fixed
8.65 MB
1 contributor
History:
2 commits
helloTR
helloTR/llama3.2-1b-dpo-fixed
57b4b16
verified
9 months ago
.gitattributes
1.52 kB
initial commit
9 months ago
README.md
2.44 kB
helloTR/llama3.2-1b-dpo-fixed
9 months ago
adapter_config.json
789 Bytes
helloTR/llama3.2-1b-dpo-fixed
9 months ago
adapter_model.safetensors
4.52 MB
xet
helloTR/llama3.2-1b-dpo-fixed
9 months ago
special_tokens_map.json
437 Bytes
helloTR/llama3.2-1b-dpo-fixed
9 months ago
tokenizer.json
3.62 MB
helloTR/llama3.2-1b-dpo-fixed
9 months ago
tokenizer.model
500 kB
xet
helloTR/llama3.2-1b-dpo-fixed
9 months ago
tokenizer_config.json
1.4 kB
helloTR/llama3.2-1b-dpo-fixed
9 months ago
training_args.bin
6.14 kB
xet
helloTR/llama3.2-1b-dpo-fixed
9 months ago