Model Card for Model ID

Model Details

Model Description

The model is obtained through alignment training of the mistralai/Mistral-7B-Instruct-v0.2 model using the alignment algorithm mentioned in "Aligning Large Language Models with Human Preferences through Representation Engineering", with UltraFeedback dataset.

You can obtain the training code for RAHF at this link.

A small detail worth noting is that we superpose the representations extracted onto Mistral7B.

  • Developed by: Wenhao Liu and Xiaohua Wang
  • Funded by [optional]: [More Information Needed]
  • Shared by [optional]: [More Information Needed]
  • Model type: LoraModel
  • Language(s) (NLP): [More Information Needed]
  • License: apache-2.0
  • Finetuned from model [optional]: mistralai/Mistral-7B-Instruct-v0.2

Citation [optional]

BibTeX:

@article{liu2023aligning,
  title={Aligning large language models with human preferences through representation engineering},
  author={Liu, Wenhao and Wang, Xiaohua and Wu, Muling and Li, Tianlong and Lv, Changze and Ling, Zixuan and Zhu, Jianhao and Zhang, Cenyuan and Zheng, Xiaoqing and Huang, Xuanjing},
  journal={arXiv preprint arXiv:2312.15997},
  year={2023}
}
Downloads last month
7
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for Liuwenhao2022/Mistral-7B-LoRA-RAHF-DUAL

Adapter
(912)
this model