ChatterjeeLab
/

MemDLM

Model card Files Files and versions

xet

Community

sgoel30

VishrutThoutam commited on Oct 21, 2024

Commit

f566f99

verified ·

1 Parent(s): 8dd2291

Update readme (#2)

Browse files

- Update readme (209ee0da5de1160dd1bcc9e6525627b39684ae5b)

Co-authored-by: Vishrut Thoutam <[email protected]>

Files changed (1) hide show

README.md +29 -6

README.md CHANGED Viewed

@@ -7,18 +7,41 @@ extra_gated_fields:
   Specific date: date_picker
   I want to use this model for:
     type: select
-    options:
-      - Research
-      - Education
-      - label: Other
-        value: other
   I agree to share generated sequences and associated data with authors before publishing: checkbox
   I agree not to file patents on any sequences generated by this model: checkbox
   I agree to use this model for non-commercial use ONLY: checkbox
 ---
 # MeMDLM: De Novo Membrane Protein Design with Masked Diffusion Language Models
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65bbea9a26c639b000501321/uWW6xnJZwQFWDS1QZNQTm.png)
-Masked Diffusion Language Models (MDLMs), introduced by Sahoo et al (arxiv.org/pdf/2406.07524), provide strong generative capabilities to BERT-style models. In this work, we pre-train and fine-tune ESM-2-150M on the MDLM objective to scaffold functional motifs while unconditionally generating realistic, high-quality membrane protein sequences.

   Specific date: date_picker
   I want to use this model for:
     type: select
+    options:
+    - Research
+    - Education
+    - label: Other
+      value: other
   I agree to share generated sequences and associated data with authors before publishing: checkbox
   I agree not to file patents on any sequences generated by this model: checkbox
   I agree to use this model for non-commercial use ONLY: checkbox
+base_model:
+- facebook/esm2_t30_150M_UR50D
+pipeline_tag: fill-mask
 ---
 # MeMDLM: De Novo Membrane Protein Design with Masked Diffusion Language Models
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65bbea9a26c639b000501321/uWW6xnJZwQFWDS1QZNQTm.png)
+Masked Diffusion Language Models (MDLMs), introduced by Sahoo et al (arxiv.org/pdf/2406.07524), provide strong generative capabilities to BERT-style models. In this work, we pre-train and fine-tune ESM-2-150M on the MDLM objective to scaffold functional motifs while unconditionally generating realistic, high-quality membrane protein sequences.
+## Model Usage
+The MDLM model leverages an internal backbone model, which is a fine-tune of ESM2 (150M). This backbone model can be used through this repo:
+```python
+from transformers import AutoTokenizer, AutoModelForMaskedLM
+tokenizer = AutoTokenizer.from_pretrained("ChatterjeeLab/MeMDLM")
+model = AutoModelForMaskedLM.from_pretrained("ChatterjeeLab/MeMDLM")
+input_sequence = "QMMALTFITYIGCGLSSIFLSVTLVILIQLCAALLLLNLIFLLDSWIALYnTRGFCIAVAVFLHYFLLVSFTWMGLEAFHMYLKFCIVGWGIPAVVVSIVLTISPDNYGidFCWINSNVVFYITVVGYFCVIFLLNVSMFIVVLVQLCRIKKKKQLGDL"
+inputs = tokenizer(input_sequence, return_tensors="pt")
+output = model(**inputs)
+filled_protein_seq = tokenizer.decode(output.squeeze()) # contains the output protein sequence with filled mask tokens
+```
+This backbone model can be integrated with the [MDLM formulation](https://github.com/kuleshov-group/mdlm) by setting the model backbone type to "hf_dit" and setting the HuggingFace Model ID to "ChatterjeeLab/MeMDLM"