probablybots commited on
Commit
6764d34
·
verified ·
1 Parent(s): 9066668

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -19
README.md CHANGED
@@ -1,49 +1,62 @@
1
  ## AIDO.RNA-650M
2
 
3
- AIDO.RNA-650M is an RNA foundation model trained on 42 million non-coding RNA sequences at single-nucleotide resolution.
 
 
4
 
5
  ## How to Use
6
- ### Build any downstream models from this backbone
 
 
 
 
 
 
 
7
  #### Embedding
8
  ```python
9
- from genbio_finetune.tasks import Embed
10
- model = Embed.from_config({"model.backbone": "rnafm_650m"}).eval()
11
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
12
  embedding = model(collated_batch)
13
  print(embedding.shape)
14
  print(embedding)
15
  ```
16
- #### Sequence Level Classification
17
  ```python
18
  import torch
19
- from genbio_finetune.tasks import SequenceClassification
20
- model = SequenceClassification.from_config({"model.backbone": "rnafm_650m", "model.n_classes": 2}).eval()
21
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
22
  logits = model(collated_batch)
23
  print(logits)
24
  print(torch.argmax(logits, dim=-1))
25
  ```
26
- #### Token Level Classification
27
  ```python
28
  import torch
29
- from genbio_finetune.tasks import TokenClassification
30
- model = TokenClassification.from_config({"model.backbone": "rnafm_650m", "model.n_classes": 3}).eval()
31
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
32
  logits = model(collated_batch)
33
  print(logits)
34
  print(torch.argmax(logits, dim=-1))
35
  ```
36
- #### Regression
37
  ```python
38
- from genbio_finetune.tasks import SequenceRegression
39
- model = SequenceRegression.from_config({"model.backbone": "rnafm_650m"}).eval()
40
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
41
  logits = model(collated_batch)
42
  print(logits)
43
  ```
44
- #### Or use our one-liner CLI to finetune or evaluate any of the above!
45
- ```
46
- gbft fit --model SequenceClassification --model.backbone rnafm_650m --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
47
- gbft test --model SequenceClassification --model.backbone rnafm_650m --data SequenceClassification --data.path <hf_or_local_path_to_your_dataset>
48
- ```
49
- For more information, visit: [Model Generator](https://github.com/genbio-ai/modelgenerator)
 
 
 
 
 
1
  ## AIDO.RNA-650M
2
 
3
+ AIDO.RNA-650M is an RNA foundation model trained on 42 million non-coding RNA sequences at single-nucleotide resolution.
4
+ For a more detailed description, refer to the SOTA model in this collection https://huggingface.co/genbio-ai/AIDO.RNA-1.6B
5
+
6
 
7
  ## How to Use
8
+ ### Build any downstream models from this backbone with ModelGenerator
9
+ For more information, visit: [Model Generator](https://github.com/genbio-ai/modelgenerator)
10
+ ```bash
11
+ mgen fit --model SequenceClassification --model.backbone aido_rna_650m --data SequenceClassificationDataModule --data.path <hf_or_local_path_to_your_dataset>
12
+ mgen test --model SequenceClassification --model.backbone aido_rna_650m --data SequenceClassificationDataModule --data.path <hf_or_local_path_to_your_dataset>
13
+ ```
14
+
15
+ ### Or use directly in Python
16
  #### Embedding
17
  ```python
18
+ from modelgenerator.tasks import Embed
19
+ model = Embed.from_config({"model.backbone": "aido_rna_650m"}).eval()
20
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
21
  embedding = model(collated_batch)
22
  print(embedding.shape)
23
  print(embedding)
24
  ```
25
+ #### Sequence-level Classification
26
  ```python
27
  import torch
28
+ from modelgenerator.tasks import SequenceClassification
29
+ model = SequenceClassification.from_config({"model.backbone": "aido_rna_650m", "model.n_classes": 2}).eval()
30
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
31
  logits = model(collated_batch)
32
  print(logits)
33
  print(torch.argmax(logits, dim=-1))
34
  ```
35
+ #### Token-level Classification
36
  ```python
37
  import torch
38
+ from modelgenerator.tasks import TokenClassification
39
+ model = TokenClassification.from_config({"model.backbone": "aido_rna_650m", "model.n_classes": 3}).eval()
40
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
41
  logits = model(collated_batch)
42
  print(logits)
43
  print(torch.argmax(logits, dim=-1))
44
  ```
45
+ #### Sequence-level Regression
46
  ```python
47
+ from modelgenerator.tasks import SequenceRegression
48
+ model = SequenceRegression.from_config({"model.backbone": "aido_rna_650m"}).eval()
49
  collated_batch = model.collate({"sequences": ["ACGT", "AGCT"]})
50
  logits = model(collated_batch)
51
  print(logits)
52
  ```
53
+
54
+ ### Get RNA sequence embedding
55
+ ```python
56
+ from genbio_finetune.tasks import Embed
57
+ model = Embed.from_config({"model.backbone": "aido_rna_650m"}).eval()
58
+ collated_batch = model.collate({"sequences": ["ACGT", "ACGT"]})
59
+ embedding = model(collated_batch)
60
+ print(embedding.shape)
61
+ print(embedding)
62
+ ```