Update README.md
Browse files
README.md
CHANGED
@@ -2,6 +2,30 @@
|
|
2 |
license: bsd
|
3 |
---
|
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
Copyright Notice
|
6 |
genomeocean: a pretrained microbial genome foundational model (genomeoceanLLM) ” Copyright (c) 2025, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy) and Northwestern University. All rights reserved.
|
7 |
|
|
|
2 |
license: bsd
|
3 |
---
|
4 |
|
5 |
+
This is the base model of GenomeOcean-4B. It is trained with Causal Language Modeling (CLM) and uses a BPE tokenizer with 4096 tokens. It supports a maximum sequence length of 10240 tokens (~50kbp).
|
6 |
+
|
7 |
+
Please see our official implementation on our [Github](https://github.com/jgi-genomeocean/genomeocean).
|
8 |
+
|
9 |
+
Quick start.
|
10 |
+
|
11 |
+
```
|
12 |
+
import torch
|
13 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
14 |
+
|
15 |
+
tokenizer = AutoTokenizer.from_pretrained(
|
16 |
+
"pGenomeOcean/GenomeOcean-4B",
|
17 |
+
trust_remote_code=True,
|
18 |
+
padding_side="left",
|
19 |
+
)
|
20 |
+
model = AutoModelForCausalLM.from_pretrained(
|
21 |
+
"pGenomeOcean/GenomeOcean-4B",
|
22 |
+
trust_remote_code=True,
|
23 |
+
torch_dtype=torch.bfloat16,
|
24 |
+
attn_implementation="flash_attention_2",
|
25 |
+
).to("cuda")
|
26 |
+
```
|
27 |
+
|
28 |
+
|
29 |
Copyright Notice
|
30 |
genomeocean: a pretrained microbial genome foundational model (genomeoceanLLM) ” Copyright (c) 2025, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy) and Northwestern University. All rights reserved.
|
31 |
|