Yanisadel commited on
Commit
d108dd4
·
verified ·
1 Parent(s): f86a0fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -1
README.md CHANGED
@@ -15,8 +15,52 @@ tissue-specific promoters and enhancers, and CTCF-bound sites) elements.
15
 
16
  ### How to use
17
 
18
- To Be Done
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
 
22
  ## Training data
 
15
 
16
  ### How to use
17
 
18
+ Until its next release, the transformers library needs to be installed from source with the following command in order to use the models. PyTorch should also be installed in order to one-hot encode the input sequences.
19
 
20
+ ```
21
+ pip install --upgrade git+https://github.com/huggingface/transformers.git
22
+ pip install torch
23
+ ```
24
+
25
+ A small snippet of code is given here in order to retrieve both logits from dummy DNA sequences.
26
+
27
+ ```
28
+ import torch
29
+ from transformers import AutoModel
30
+
31
+ model = AutoModel.from_pretrained("InstaDeepAI/segment_borzoi", trust_remote_code=True)
32
+
33
+ def encode_sequences(sequences):
34
+ one_hot_map = {
35
+ 'a': torch.tensor([1., 0., 0., 0.]),
36
+ 'c': torch.tensor([0., 1., 0., 0.]),
37
+ 'g': torch.tensor([0., 0., 1., 0.]),
38
+ 't': torch.tensor([0., 0., 0., 1.]),
39
+ 'n': torch.tensor([0., 0., 0., 0.]),
40
+ 'A': torch.tensor([1., 0., 0., 0.]),
41
+ 'C': torch.tensor([0., 1., 0., 0.]),
42
+ 'G': torch.tensor([0., 0., 1., 0.]),
43
+ 'T': torch.tensor([0., 0., 0., 1.]),
44
+ 'N': torch.tensor([0., 0., 0., 0.])
45
+ }
46
+
47
+ def encode_sequence(seq_str):
48
+ one_hot_list = []
49
+ for char in seq_str:
50
+ one_hot_vector = one_hot_map.get(char, torch.tensor([0.25, 0.25, 0.25, 0.25]))
51
+ one_hot_list.append(one_hot_vector)
52
+ return torch.stack(one_hot_list)
53
+
54
+ if isinstance(sequences, list):
55
+ return torch.stack([encode_sequence(seq) for seq in sequences])
56
+ else:
57
+ return encode_sequence(sequences)
58
+
59
+ sequences = ["A"*524_288, "G"*524_288]
60
+ one_hot_encoding = encode_sequences(sequences)
61
+ preds = model(one_hot_encoding)
62
+ print(preds['logits'])
63
+ ```
64
 
65
 
66
  ## Training data