File size: 2,835 Bytes
7890149 d59d1b6 0cdc013 7890149 d59d1b6 da70735 d59d1b6 4b24a65 cd78944 4b24a65 cd78944 4b24a65 62e4f69 cd78944 4b24a65 cd78944 4b24a65 62e4f69 cd78944 4b24a65 cd78944 4b24a65 62e4f69 cd78944 4b24a65 cd78944 4b24a65 62e4f69 cd78944 4b24a65 62e4f69 4b24a65 cd78944 4b24a65 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
---
base_model:
- genbio-ai/AIDO.RNA-650M
license: other
---
# AIDO.RNA-650M-CDS
AIDO.RNA-650M-CDS is a domain adaptation model on the coding sequences.
It was pre-trained on 9 million coding sequences released by Carlos et al. (2024) [1] based on our [AIDO.RNA-650M](https://huggingface.co/genbio-ai/AIDO.RNA-650M) model.
For a more detailed description, refer to the SOTA model in this collection https://huggingface.co/genbio-ai/AIDO.RNA-1.6B
## How to Use
### Build any downstream models from this backbone with ModelGenerator
For more information, visit: [Model Generator](https://github.com/genbio-ai/modelgenerator)
```bash
mgen fit --model SequenceClassification --model.backbone aido_rna_650m_cds --data SequenceClassificationDataModule --data.path <hf_or_local_path_to_your_dataset>
mgen test --model SequenceClassification --model.backbone aido_rna_650m_cds --data SequenceClassificationDataModule --data.path <hf_or_local_path_to_your_dataset>
```
### Or use directly in Python
#### Embedding
```python
from modelgenerator.tasks import Embed
model = Embed.from_config({"model.backbone": "aido_rna_650m_cds"}).eval()
transformed_batch = model.transform({"sequences": ["ACGT", "AGCT"]})
embedding = model(transformed_batch)
print(embedding.shape)
print(embedding)
```
#### Sequence-level Classification
```python
import torch
from modelgenerator.tasks import SequenceClassification
model = SequenceClassification.from_config({"model.backbone": "aido_rna_650m_cds", "model.n_classes": 2}).eval()
transformed_batch = model.transform({"sequences": ["ACGT", "AGCT"]})
logits = model(transformed_batch)
print(logits)
print(torch.argmax(logits, dim=-1))
```
#### Token-level Classification
```python
import torch
from modelgenerator.tasks import TokenClassification
model = TokenClassification.from_config({"model.backbone": "aido_rna_650m_cds", "model.n_classes": 3}).eval()
transformed_batch = model.transform({"sequences": ["ACGT", "AGCT"]})
logits = model(transformed_batch)
print(logits)
print(torch.argmax(logits, dim=-1))
```
#### Sequence-level Regression
```python
from modelgenerator.tasks import SequenceRegression
model = SequenceRegression.from_config({"model.backbone": "aido_rna_650m_cds"}).eval()
transformed_batch = model.transform({"sequences": ["ACGT", "AGCT"]})
logits = model(transformed_batch)
print(logits)
```
### Get RNA sequence embedding
```python
from genbio_finetune.tasks import Embed
model = Embed.from_config({"model.backbone": "aido_rna_650m_cds"}).eval()
transformed_batch = model.transform({"sequences": ["ACGT", "ACGT"]})
embedding = model(transformed_batch)
print(embedding.shape)
print(embedding)
```
## Reference
1. Carlos Outeiral and Charlotte M Deane. Codon language embeddings provide strong signals for use in protein engineering. Nature Machine Intelligence, 6(2):170–179, 2024. |