pGenomeOcean
/

GenomeOcean-4B

Model card Files Files and versions Community

zhihan1996 commited on Jan 30

Commit

44b413b

·

verified ·

1 Parent(s): aca1bff

Update README.md

Files changed (1) hide show

README.md +24 -0

README.md CHANGED Viewed

@@ -2,6 +2,30 @@
 license: bsd
 ---
 Copyright Notice
 genomeocean: a pretrained microbial genome foundational model (genomeoceanLLM) ” Copyright (c) 2025, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy) and Northwestern University. All rights reserved.

 license: bsd
 ---
+This is the base model of GenomeOcean-4B. It is trained with Causal Language Modeling (CLM) and uses a BPE tokenizer with 4096 tokens. It supports a maximum sequence length of 10240 tokens (~50kbp).
+Please see our official implementation on our [Github](https://github.com/jgi-genomeocean/genomeocean).
+Quick start.
+```
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained(
+    "pGenomeOcean/GenomeOcean-4B",
+    trust_remote_code=True,
+    padding_side="left",
+)
+model = AutoModelForCausalLM.from_pretrained(
+    "pGenomeOcean/GenomeOcean-4B",
+    trust_remote_code=True,
+    torch_dtype=torch.bfloat16,
+    attn_implementation="flash_attention_2",
+).to("cuda")
+```
 Copyright Notice
 genomeocean: a pretrained microbial genome foundational model (genomeoceanLLM) ” Copyright (c) 2025, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy) and Northwestern University. All rights reserved.