|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
# MolXPT |
|
|
|
Our model is a variant of GPT pre-trained on SMILES (a sequence representation of molecules) wrapped by text. Our model is based on [BioGPT](https://huggingface.co/microsoft/biogpt) and we redefine the tokenizer. |
|
|
|
## Example Usage |
|
```python |
|
from transformers import AutoTokenizer, BioGptForCausalLM |
|
|
|
model = BioGptForCausalLM.from_pretrained("zequnl/molxpt") |
|
molxpt_tokenizer = AutoTokenizer.from_pretrained("zequnl/molxpt", trust_remote_code=True) |
|
|
|
model = model.cuda() |
|
model.eval() |
|
|
|
input_ids = molxpt_tokenizer('<start-of-mol>CC(=O)OC1=CC=CC=C1C(=O)O<end-of-mol> is ', return_tensors="pt").input_ids.cuda() |
|
output = model.generate( |
|
input_ids, |
|
max_new_tokens=300, |
|
num_return_sequences=4, |
|
temperature=0.75, |
|
top_p=0.95, |
|
do_sample=True, |
|
) |
|
|
|
for i in range(4): |
|
s = molxpt_tokenizer.decode(output[i]) |
|
print(s) |
|
``` |
|
## References |
|
For more information, please refer to our paper and GitHub repository. |
|
|
|
Paper: [MolXPT: Wrapping Molecules with Text for Generative Pre-training](https://aclanthology.org/2023.acl-short.138/) |
|
|
|
Authors: *Zequn Liu, Wei Zhang, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Ming Zhang, Tie-Yan Liu* |