File size: 1,526 Bytes
c365b79 eb490b3 c365b79 ea2d1c7 6f91095 ea2d1c7 6f91095 ea2d1c7 6f91095 ea2d1c7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
---
metrics:
- accuracy
- mse
library_name: transformers
tags:
- biology
---
This is **roberta-base trained on DNA promoter sequences of plants and fine-tuned on gene expression values (normalized to tpm)** in 8 tissues of maize cultivars corresponding to their individual promoter sequences.
Currently, this model is trained on **11.7 million Plant DNA promoter sequences**. There are 47 million parameters in this model.
References:
- [GitHub Repository](https://github.com/gurveervirk/florabert/)
- [Kaggle Dataset](https://www.kaggle.com/datasets/gurveersinghvirk/florabert-base)
- [Video Demo](https://youtu.be/WZzjHH740kw)
To get predictions from **DNA promoter sequences of plants** from console / command-line directly, add your text file containing the sequences (1 sequence per line) to the data folder and call the main() function from prediction.py with your file name.
For example:
- Update ```main("test.txt")``` with your file name
- Now, run ```python prediction.py```
The results will be visible in tabular format in the console.
For example,
| tassel | base | anther | middle | ear | shoot | tip | root |
|--------|--------|--------|--------|--------|--------|--------|--------|
| 8.65 | 7.901 | 2.004 | 8.4001 | 7.523 | 6.23 | 9.0112 | 8.221 |
The values in the table correspond to TPM values for the tissues in the plants. TPM values are normalized gene expression values.
Both models can also be further used for more pretraining and finetuning. (Check references for further information) |