|
--- |
|
base_model: mistralai/Mistral-7B-v0.1 |
|
datasets: |
|
- siqi00/mistral_ultrafeedback_unhelpful_chatprompt_0.7_1.0_50_320 |
|
library_name: transformers |
|
license: apache-2.0 |
|
tags: |
|
- alignment-handbook |
|
- generated_from_trainer |
|
pipeline_tag: text-generation |
|
model-index: |
|
- name: mistral-feedbuhcp2-dft-lr2e-6-tau1.0-u_init0-s2-e2-gamma0.85 |
|
results: [] |
|
--- |
|
|
|
# Mistral-7B-DFT |
|
|
|
This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the siqi00/mistral_ultrafeedback_unhelpful_chatprompt_0.7_1.0_50_320 dataset. It was finetuned as part of the paper [Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data](https://arxiv.org/abs/2502.18679) |
|
|
|
The code is available at https://github.com/PenGuln/DFT. |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 2e-06 |
|
- train_batch_size: 4 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- distributed_type: multi-GPU |
|
- num_devices: 4 |
|
- gradient_accumulation_steps: 8 |
|
- total_train_batch_size: 128 |
|
- total_eval_batch_size: 32 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: cosine |
|
- lr_scheduler_warmup_ratio: 0.1 |
|
- num_epochs: 2 |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.45.2 |
|
- Pytorch 2.1.2+cu121 |
|
- Datasets 3.0.1 |
|
- Tokenizers 0.20.1 |
|
|
|
### Usage Example |
|
|
|
The model can be used for text generation tasks. A basic example using the `transformers` library is shown below: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig |
|
import torch |
|
|
|
model_id = "siqi00/Mistral-7B-DFT" |
|
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
|
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True) |
|
|
|
prompt = "What is the capital of France?" |
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
generation_config = GenerationConfig(max_new_tokens=20, temperature=0.7) |
|
outputs = model.generate(inputs["input_ids"], generation_config=generation_config) |
|
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(generated_text) |
|
``` |
|
|
|
Remember to install the necessary libraries (`pip install transformers`) and adjust parameters like `temperature` and `max_new_tokens` to fine-tune generation. |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{guo2025discriminativefinetuninggenerativelarge, |
|
title={Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data}, |
|
author={Siqi Guo and Ilgee Hong and Vicente Balmaseda and Tuo Zhao and Tianbao Yang}, |
|
year={2025}, |
|
eprint={2502.18679}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2502.18679}, |
|
} |
|
``` |