File size: 2,358 Bytes
d40fdfc
 
 
 
 
 
 
 
 
 
 
 
 
 
9098b44
 
80ae7be
9098b44
 
cd37874
9098b44
 
 
80ae7be
9098b44
 
 
80ae7be
1e4ca1b
d8fdc3c
9098b44
 
 
 
 
80ae7be
 
 
9098b44
80ae7be
 
 
9098b44
80ae7be
 
 
 
 
9098b44
80ae7be
 
 
 
 
9098b44
80ae7be
9098b44
 
 
 
 
80ae7be
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
license: mit
pipeline_tag: feature-extraction
tags:
- biology
- Gene
- Protein
- GO
- MLM
- Gene function
- Gene Ontology
- DAG
- Protein function
---

## Model Details
GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction.

### Model Description
First encoder to capture relations among GO functions. Could generate GO function embedding for various biological applications that related to gene or gene products. For the Gene-GO function mapping database, please refer to our previous work UniEtnrezDB (UniEntrezGOA.zip at https://zenodo.org/records/13335548) 



### Model Sources 

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/MM-YY-WW/GoBERT
- **Paper:** GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction. (AAAI-25)
- **Demo:** https://gobert.nasy.moe/

## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import AutoTokenizer, BertForPreTraining
import torch

repo_name = "MM-YY-WW/GoBERT"
tokenizer = AutoTokenizer.from_pretrained(repo_name, use_fast=False, trust_remote_code=True)
model = BertForPreTraining.from_pretrained(repo_name)

# Obtain function-level GoBERT Embedding:
input_sequences = 'GO:0005739 GO:0005783 GO:0005829 GO:0006914 GO:0006915 GO:0006979 GO:0031966 GO:0051560'
tokenized_input = tokenizer(input_sequences)
input_tensor = torch.tensor(tokenized_input['input_ids']).unsqueeze(0)
attention_mask = torch.tensor(tokenized_input['attention_mask']).unsqueeze(0)

model.eval()
with torch.no_grad():
    outputs = model(input_ids=input_tensor, attention_mask=attention_mask, output_hidden_states=True)
    embedding = outputs.hidden_states[-1].squeeze(0).cpu().numpy() 
```

## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

```bibtex
@inproceedings{miao2025gobert,
  title={GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction},
  author={Miao, Yuwei and Guo, Yuzhi and Ma, Hehuan and Yan, Jingquan and Jiang, Feng and Liao, Rui and Huang, Junzhou},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={1},
  pages={622--630},
  year={2025},
  doi={10.1609/aaai.v39i1.32043}
}
```