---
library_name: peft
base_model: mistralai/Mistral-7B-Instruct-v0.2
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->


## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

The model is obtained through alignment training of the [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) model using the alignment algorithm mentioned in ["Aligning Large Language Models with Human Preferences through Representation Engineering"](https://arxiv.org/abs/2312.15997), with [UltraFeedback dataset](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned).

You can obtain the training code for RAHF at [this link](https://github.com/LiuAmber/RAHF).

A small detail worth noting is that we superpose the representations extracted onto Mistral7B.

- **Developed by:** Wenhao Liu and Xiaohua Wang
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Model type:** LoraModel
- **Language(s) (NLP):** [More Information Needed]
- **License:** apache-2.0
- **Finetuned from model [optional]:** [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)


## Citation [optional]

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

```
@article{liu2023aligning,
  title={Aligning large language models with human preferences through representation engineering},
  author={Liu, Wenhao and Wang, Xiaohua and Wu, Muling and Li, Tianlong and Lv, Changze and Ling, Zixuan and Zhu, Jianhao and Zhang, Cenyuan and Zheng, Xiaoqing and Huang, Xuanjing},
  journal={arXiv preprint arXiv:2312.15997},
  year={2023}
}
```