GoBERT / README.md
MM-YY-WW's picture
Update README.md
cd37874 verified
metadata
license: mit
pipeline_tag: feature-extraction
tags:
  - biology
  - Gene
  - Protein
  - GO
  - MLM
  - Gene function
  - Gene Ontology
  - DAG
  - Protein function

Model Details

GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction.

Model Description

First encoder to capture relations among GO functions. Could generate GO function embedding for various biological applications that related to gene or gene products. For the Gene-GO function mapping database, please refer to our previous work UniEtnrezDB (UniEntrezGOA.zip at https://zenodo.org/records/13335548)

Model Sources

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, BertForPreTraining
import torch

repo_name = "MM-YY-WW/GoBERT"
tokenizer = AutoTokenizer.from_pretrained(repo_name, use_fast=False, trust_remote_code=True)
model = BertForPreTraining.from_pretrained(repo_name)

# Obtain function-level GoBERT Embedding:
input_sequences = 'GO:0005739 GO:0005783 GO:0005829 GO:0006914 GO:0006915 GO:0006979 GO:0031966 GO:0051560'
tokenized_input = tokenizer(input_sequences)
input_tensor = torch.tensor(tokenized_input['input_ids']).unsqueeze(0)
attention_mask = torch.tensor(tokenized_input['attention_mask']).unsqueeze(0)

model.eval()
with torch.no_grad():
    outputs = model(input_ids=input_tensor, attention_mask=attention_mask, output_hidden_states=True)
    embedding = outputs.hidden_states[-1].squeeze(0).cpu().numpy() 

Citation

BibTeX:

@inproceedings{miao2025gobert,
  title={GoBERT: Gene Ontology Graph Informed BERT for Universal Gene Function Prediction},
  author={Miao, Yuwei and Guo, Yuzhi and Ma, Hehuan and Yan, Jingquan and Jiang, Feng and Liao, Rui and Huang, Junzhou},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={1},
  pages={622--630},
  year={2025},
  doi={10.1609/aaai.v39i1.32043}
}