|
--- |
|
license: mit |
|
pipeline_tag: feature-extraction |
|
tags: |
|
- biology |
|
- Gene |
|
- Protein |
|
- GO |
|
- MLM |
|
- Gene function |
|
- Gene Ontology |
|
- DAG |
|
- Protein function |
|
--- |
|
|
|
## Model Details |
|
GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction. |
|
|
|
### Model Description |
|
First encoder to capture relations among GO functions. Could generate GO function embedding for various biological applications that related to gene or gene products. For the Gene-GO function mapping database, please refer to our previous work UniEtnrezDB (UniEntrezGOA.zip at https://zenodo.org/records/13335548) |
|
|
|
|
|
|
|
### Model Sources |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** https://github.com/MM-YY-WW/GoBERT |
|
- **Paper:** GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction. (AAAI-25) |
|
- **Demo:** https://gobert.nasy.moe/ |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
from transformers import AutoTokenizer, BertForPreTraining |
|
import torch |
|
|
|
repo_name = "MM-YY-WW/GoBERT" |
|
tokenizer = AutoTokenizer.from_pretrained(repo_name, use_fast=False, trust_remote_code=True) |
|
model = BertForPreTraining.from_pretrained(repo_name) |
|
|
|
# Obtain function-level GoBERT Embedding: |
|
input_sequences = 'GO:0005739 GO:0005783 GO:0005829 GO:0006914 GO:0006915 GO:0006979 GO:0031966 GO:0051560' |
|
tokenized_input = tokenizer(input_sequences) |
|
input_tensor = torch.tensor(tokenized_input['input_ids']).unsqueeze(0) |
|
attention_mask = torch.tensor(tokenized_input['attention_mask']).unsqueeze(0) |
|
|
|
model.eval() |
|
with torch.no_grad(): |
|
outputs = model(input_ids=input_tensor, attention_mask=attention_mask, output_hidden_states=True) |
|
embedding = outputs.hidden_states[-1].squeeze(0).cpu().numpy() |
|
``` |
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
**BibTeX:** |
|
|
|
```bibtex |
|
@inproceedings{miao2025gobert, |
|
title={GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction}, |
|
author={Miao, Yuwei and Guo, Yuzhi and Ma, Hehuan and Yan, Jingquan and Jiang, Feng and Liao, Rui and Huang, Junzhou}, |
|
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, |
|
volume={39}, |
|
number={1}, |
|
pages={622--630}, |
|
year={2025}, |
|
doi={10.1609/aaai.v39i1.32043} |
|
} |
|
``` |