zhihan1996 commited on
Commit
44b413b
·
verified ·
1 Parent(s): aca1bff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -2,6 +2,30 @@
2
  license: bsd
3
  ---
4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  Copyright Notice
6
  genomeocean: a pretrained microbial genome foundational model (genomeoceanLLM) ” Copyright (c) 2025, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy) and Northwestern University. All rights reserved.
7
 
 
2
  license: bsd
3
  ---
4
 
5
+ This is the base model of GenomeOcean-4B. It is trained with Causal Language Modeling (CLM) and uses a BPE tokenizer with 4096 tokens. It supports a maximum sequence length of 10240 tokens (~50kbp).
6
+
7
+ Please see our official implementation on our [Github](https://github.com/jgi-genomeocean/genomeocean).
8
+
9
+ Quick start.
10
+
11
+ ```
12
+ import torch
13
+ from transformers import AutoModelForCausalLM, AutoTokenizer
14
+
15
+ tokenizer = AutoTokenizer.from_pretrained(
16
+ "pGenomeOcean/GenomeOcean-4B",
17
+ trust_remote_code=True,
18
+ padding_side="left",
19
+ )
20
+ model = AutoModelForCausalLM.from_pretrained(
21
+ "pGenomeOcean/GenomeOcean-4B",
22
+ trust_remote_code=True,
23
+ torch_dtype=torch.bfloat16,
24
+ attn_implementation="flash_attention_2",
25
+ ).to("cuda")
26
+ ```
27
+
28
+
29
  Copyright Notice
30
  genomeocean: a pretrained microbial genome foundational model (genomeoceanLLM) ” Copyright (c) 2025, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy) and Northwestern University. All rights reserved.
31