Spaces:
Running
Running
title: README | |
emoji: π | |
colorFrom: pink | |
colorTo: blue | |
sdk: static | |
pinned: false | |
# Neural Bioinformatics Research Group - ProkBERT Models | |
Welcome to the official Hugging Face organization for the Neural Bioinformatics Research Group. Our main goal is to provide genomic language models for microbiome applications. | |
## Models | |
We provide access to a collection of pretrained and fine-tuned models from the ProkBERT family. These models are built on the Local Context Aware (LCA) tokenization, specifically tailored for DNA sequences to balance context size and performance. | |
ProkBERT models are designed for microbiome-related tasks, such as prokaryote promoter identification or phage detection. Despite their compact size, they are powerful and efficient. | |
## Model Overview | |
| Model | Parameters | Tokenizer | Layers | Attention Heads | Max. Context Size | Training Data | | |
|---------------|------------|------------------|--------|-----------------|-------------------|---------------------| | |
| `mini` | 20.6M | 6-mer, shift=1 | 6 | 6 | 1027 nt | 206.65 billion | | |
| `mini-c` | 24.9M | 1-mer | 6 | 6 | 1022 nt | 206.65 billion | | |
| `mini-long` | 26.6M | 6-mer, shift=2 | 6 | 6 | 4096 nt | 206.65 billion | | |
_A comprehensive overview of model parameters across varied configurations._ | |
## Resources | |
- [Read our paper](https://www.frontiersin.org/articles/10.3389/fmicb.2023.1331233/full) | |
- [Learn more about the model](https://github.com/nbrg-ppcu/prokbert) | |
- [Get started with code on GitHub](https://github.com/nbrg-ppcu/prokbert/tree/main?tab=readme-ov-file#tutorials-and-examples) | |
--- | |
For more information or questions, please visit our [GitHub repository](https://github.com/nbrg-ppcu/prokbert) or contact us at [email](obalasz@gmail.com). | |