Commit
·
c05f762
1
Parent(s):
542ec7f
Update README.md
Browse files
README.md
CHANGED
@@ -17,6 +17,17 @@ metrics:
|
|
17 |
---
|
18 |
# ESM-2 Fine-tuned CAFA-5
|
19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
## Training
|
21 |
|
22 |
Macro
|
|
|
17 |
---
|
18 |
# ESM-2 Fine-tuned CAFA-5
|
19 |
|
20 |
+
## ESM-2 for Protein Function Prediction
|
21 |
+
|
22 |
+
This is an experimental model fine-tuned from the
|
23 |
+
[esm2_t6_8M_UR50D](https://huggingface.co/facebook/esm2_t6_8M_UR50D) model
|
24 |
+
for multi-label classification. In particular, the model is fine-tuned on the CAFA-5 protein sequence dataset available
|
25 |
+
[here](https://huggingface.co/datasets/AmelieSchreiber/cafa_5). More precisely, the `train_sequences.fasta` file is the
|
26 |
+
list of protein sequences that were trained on, and the
|
27 |
+
`train_terms.tsv` file contains the gene ontology protein function labels for each protein sequence. For more details on using
|
28 |
+
ESM-2 models for multi-label sequence classification, [see here](https://huggingface.co/docs/transformers/model_doc/esm).
|
29 |
+
Due to the potentially complicated class weighting necessary for the hierarchical ontology, further fine-tuning will be necessary.
|
30 |
+
|
31 |
## Training
|
32 |
|
33 |
Macro
|