File size: 1,497 Bytes
bc3bc84 19f2855 bc3bc84 19f2855 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
language:
- en
metrics:
- accuracy
- AUC ROC
- precision
- recall
tags:
- biology
- chemistry
library_name: tdc
license: mit
---
## Dataset description
An integrated Ether-a-go-go-related gene (hERG) dataset consisting of molecular structures labelled as hERG (<10uM) and non-hERG (>=10uM) blockers in the form of SMILES strings was obtained from the DeepHIT, the BindingDB database, ChEMBL bioactivity database, and other literature.
## Task description
Binary classification. Given a drug SMILES string, predict whether it blocks (1, <10uM) or not blocks (0, >=10uM).
## Dataset statistics
Total: 13445; Train_val: 12620; Test: 825
## Dataset split:
Random split on 70% training, 10% validation, and 20% testing
To load the dataset in TDC, type
```python
from tdc.single_pred import Tox
data = Tox(name = 'herg_karim')
```
## Model description
AttentiveFP is a Graph Attention Network-based molecular representation learning method. Model is tuned with 100 runs using Ax platform.
To load the pre-trained model, type
```python
from tdc import tdc_hf_interface
tdc_hf = tdc_hf_interface("hERG_Karim-AttentiveFP")
# load deeppurpose model from this repo
dp_model = tdc_hf.load_deeppurpose('./data')
tdc_hf.predict_deeppurpose(dp_model, ['CC(=O)NC1=CC=C(O)C=C1'])
```
## References:
[1] Karim, A., et al. CardioTox net: a robust predictor for hERG channel blockade based on deep learning meta-feature ensembles. J Cheminform 13, 60 (2021). https://doi.org/10.1186/s13321-021-00541-z |