--- library_name: peft base_model: mistralai/Mistral-7B-Instruct-v0.2 --- # Model Card for Model ID ## Model Details ### Model Description The model is obtained through alignment training of the [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) model using the alignment algorithm mentioned in ["Aligning Large Language Models with Human Preferences through Representation Engineering"](https://arxiv.org/abs/2312.15997), with [UltraFeedback dataset](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned). You can obtain the training code for RAHF at [this link](https://github.com/LiuAmber/RAHF). A small detail worth noting is that we superpose the representations extracted onto Mistral7B. - **Developed by:** Wenhao Liu and Xiaohua Wang - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Model type:** LoraModel - **Language(s) (NLP):** [More Information Needed] - **License:** apache-2.0 - **Finetuned from model [optional]:** [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) ## Citation [optional] **BibTeX:** ``` @article{liu2023aligning, title={Aligning large language models with human preferences through representation engineering}, author={Liu, Wenhao and Wang, Xiaohua and Wu, Muling and Li, Tianlong and Lv, Changze and Ling, Zixuan and Zhu, Jianhao and Zhang, Cenyuan and Zheng, Xiaoqing and Huang, Xuanjing}, journal={arXiv preprint arXiv:2312.15997}, year={2023} } ```