Update readme (#2)
Browse files- Update readme (209ee0da5de1160dd1bcc9e6525627b39684ae5b)
Co-authored-by: Vishrut Thoutam <[email protected]>
README.md
CHANGED
|
@@ -7,18 +7,41 @@ extra_gated_fields:
|
|
| 7 |
Specific date: date_picker
|
| 8 |
I want to use this model for:
|
| 9 |
type: select
|
| 10 |
-
options:
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
I agree to share generated sequences and associated data with authors before publishing: checkbox
|
| 16 |
I agree not to file patents on any sequences generated by this model: checkbox
|
| 17 |
I agree to use this model for non-commercial use ONLY: checkbox
|
|
|
|
|
|
|
|
|
|
| 18 |
---
|
| 19 |
|
| 20 |
# MeMDLM: De Novo Membrane Protein Design with Masked Diffusion Language Models
|
| 21 |
|
| 22 |

|
| 23 |
|
| 24 |
-
Masked Diffusion Language Models (MDLMs), introduced by Sahoo et al (arxiv.org/pdf/2406.07524), provide strong generative capabilities to BERT-style models. In this work, we pre-train and fine-tune ESM-2-150M on the MDLM objective to scaffold functional motifs while unconditionally generating realistic, high-quality membrane protein sequences.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
Specific date: date_picker
|
| 8 |
I want to use this model for:
|
| 9 |
type: select
|
| 10 |
+
options:
|
| 11 |
+
- Research
|
| 12 |
+
- Education
|
| 13 |
+
- label: Other
|
| 14 |
+
value: other
|
| 15 |
I agree to share generated sequences and associated data with authors before publishing: checkbox
|
| 16 |
I agree not to file patents on any sequences generated by this model: checkbox
|
| 17 |
I agree to use this model for non-commercial use ONLY: checkbox
|
| 18 |
+
base_model:
|
| 19 |
+
- facebook/esm2_t30_150M_UR50D
|
| 20 |
+
pipeline_tag: fill-mask
|
| 21 |
---
|
| 22 |
|
| 23 |
# MeMDLM: De Novo Membrane Protein Design with Masked Diffusion Language Models
|
| 24 |
|
| 25 |

|
| 26 |
|
| 27 |
+
Masked Diffusion Language Models (MDLMs), introduced by Sahoo et al (arxiv.org/pdf/2406.07524), provide strong generative capabilities to BERT-style models. In this work, we pre-train and fine-tune ESM-2-150M on the MDLM objective to scaffold functional motifs while unconditionally generating realistic, high-quality membrane protein sequences.
|
| 28 |
+
|
| 29 |
+
## Model Usage
|
| 30 |
+
|
| 31 |
+
The MDLM model leverages an internal backbone model, which is a fine-tune of ESM2 (150M). This backbone model can be used through this repo:
|
| 32 |
+
|
| 33 |
+
```python
|
| 34 |
+
from transformers import AutoTokenizer, AutoModelForMaskedLM
|
| 35 |
+
|
| 36 |
+
tokenizer = AutoTokenizer.from_pretrained("ChatterjeeLab/MeMDLM")
|
| 37 |
+
model = AutoModelForMaskedLM.from_pretrained("ChatterjeeLab/MeMDLM")
|
| 38 |
+
|
| 39 |
+
input_sequence = "QMMALTFITYIGCGLSSIFLSVTLVILIQLCAALLLLNLIFLLDSWIALYnTRGFCIAVAVFLHYFLLVSFTWMGLEAFHMYLKFCIVGWGIPAVVVSIVLTISPDNYGidFCWINSNVVFYITVVGYFCVIFLLNVSMFIVVLVQLCRIKKKKQLGDL"
|
| 40 |
+
|
| 41 |
+
inputs = tokenizer(input_sequence, return_tensors="pt")
|
| 42 |
+
output = model(**inputs)
|
| 43 |
+
|
| 44 |
+
filled_protein_seq = tokenizer.decode(output.squeeze()) # contains the output protein sequence with filled mask tokens
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
This backbone model can be integrated with the [MDLM formulation](https://github.com/kuleshov-group/mdlm) by setting the model backbone type to "hf_dit" and setting the HuggingFace Model ID to "ChatterjeeLab/MeMDLM"
|