language:
- en
license: mit
tags:
- chemistry
- SMILES
- product
datasets:
- ORD
metrics:
- accuracy
Model Card for ReactionT5-product-prediction
This is a ReactionT5 pre-trained to predict the products of reactions. You can use the demo here.
Model Details
Model Sources
- Repository: https://github.com/sagawatatsuya/ReactionT5
- Paper: {{ paper | default("[More Information Needed]", true)}}
- Demo: https://huggingface.co/spaces/sagawa/predictproduct-t5
Uses
How to Get Started with the Model
Download files and use the code below to get started with the model.
from transformers import AutoTokenizer, T5ForConditionalGeneration
tokenizer = AutoTokenizer.from_pretrained('sagawa/ReactionT5-product-prediction')
inp = tokenizer('REACTANT:COC(=O)C1=CCCN(C)C1.O.[Al+3].[H-].[Li+].[Na+].[OH-]REAGENT:C1CCOC1', return_tensors='pt')
model = T5ForConditionalGeneration.from_pretrained('sagawa/ReactionT5-product-prediction')
output = model.generate(**inp, min_length=6, max_length=109, num_beams=1, num_return_sequences=1, return_dict_in_generate=True, output_scores=True)
output = tokenizer.decode(output['sequences'][0], skip_special_tokens=True).replace(' ', '').rstrip('.')
output # 'O=S(=O)([O-])[O-].O=S(=O)([O-])[O-].O=S(=O)([O-])[O-].[Cr+3].[Cr+3]'
Training Details
Training Procedure
We used Open Reaction Database (ORD) dataset for model training. Following is the command used for training. For more information, please refer to the paper and GitHub repository.
python train.py \
--epochs=100 \
--batch_size=32 \
--data_path='../data/all_ord_reaction_uniq_with_attr_v3.csv' \
--use_reconstructed_data \
--pretrained_model_name_or_path='sagawa/CompoundT5'
Results
Model | Training set | Test set | Top-1 [% acc.] | Top-2 [% acc.] | Top-3 [% acc.] | Top-5 [% acc.] |
---|---|---|---|---|---|---|
Sequence-to-sequence | USPTO | USPTO | 80.3 | 84.7 | 86.2 | 87.5 |
WLDN | USPTO | USPTO | 80.6 (85.6) | 90.5 | 92.8 | 93.4 |
Molecular Transformer | USPTO | USPTO | 88.8 | 92.6 | β | 94.4 |
T5Chem | USPTO | USPTO | 90.4 | 94.2 | β | 96.4 |
CompoundT5 | USPTO | USPTO | 88.0 | 92.4 | 93.9 | 95.0 |
ReactionT5 | ORD | USPTO | 0.0 <85.0> | 0.0 <90.6> | 0.0 <92.3> | 0.0 <93.8> |
Performance comparison of Compound T5, ReactionT5, and other models in product prediction. The values enclosed in β<>β in the table represent the scores of the model that was fine-tuned on 200 reactions from the USPTO dataset. The score enclosed in β()β is the one reported in the original paper.
Citation [optional]
Model Card Authors [optional]
{{ model_card_authors | default("[More Information Needed]", true)}}
Model Card Contact
{{ model_card_contact | default("[More Information Needed]", true)}}