--- tags: - biology --- # Model Card for Model ID PlantGFM-Gene-generation is a gene generation model re-trained from PlantGFM using DNA sequences of 355,190 natural plant genes with lengths less than or equal to 4,000 base pairs. The model was re-trained with prompt-based training for two epochs, using the prompt "gene" to guide the learning process and help the model generate novel plant gene sequences that align with the patterns and structures of natural genes. ### Model Sources - **Repository:** [PlantGFM](https://github.com/hu-lab-PlantGLM/PlantGLM) - **Manuscript:** [A Genetic Foundation Model for Discovery and Creation of Plant Genes]() **Developed by:** hu-lab # How to use the model Install the runtime library first: ```bash pip install transformers ``` To generate a new gene sequence using the model: ```python import torch from transformers import PreTrainedTokenizerFast from torch.cuda.amp import autocast from plantgfm.configuration_plantgfm import PlantGFMConfig from plantgfm.modeling_plantgfm import PlantGFMForCausalLM device = torch.device("cuda" if torch.cuda.is_available() else "cpu") config = PlantGFMConfig.from_pretrained("hu-lab/PlantGFM-Gene-generation ") tokenizer = PreTrainedTokenizerFast.from_pretrained("hu-lab/PlantGFM-Gene-generation") model = PlantGFMForCausalLM.from_pretrained("hu-lab/PlantGFM-Gene-generation", config=config).to(device) model = model.to(dtype=torch.bfloat16) num_texts = 1 batch_size = 1 generated_texts = [] input_ids = tokenizer.encode("", return_tensors="pt").to(device, dtype=torch.long) input_ids = input_ids.expand(batch_size, -1) for i in range(0, num_texts, batch_size): with autocast(dtype=torch.bfloat16): generated_text = model.generate( input_ids=input_ids, max_length=4000, do_sample=True, ) for output_sequence in output: generated_text = tokenizer.decode(output_sequence, skip_special_tokens=True) print(generated_text) ``` #### Hardware Model was trained for 15 hours on 2 Nvidia A100-40G GPUs.