Update README.md
Browse files
README.md
CHANGED
@@ -13,11 +13,15 @@ base_model:
|
|
13 |
|
14 |
# ModernBERT2gpt2-700m baseline
|
15 |
|
16 |
-
EncoderDecoder created from modernBERT-large and random-init `gpt2` trained on the pszemraj/t2t-re_pretrain-small dataset for one epoch as a "baseline".
|
17 |
|
18 |
- input context length 2048
|
19 |
- output context length 512
|
20 |
-
- single tokenizer,
|
|
|
|
|
|
|
|
|
21 |
|
22 |
It achieves the following results on the evaluation set:
|
23 |
- Loss: 2.2113
|
|
|
13 |
|
14 |
# ModernBERT2gpt2-700m baseline
|
15 |
|
16 |
+
EncoderDecoder created from modernBERT-large and random-init `gpt2` trained on the pszemraj/t2t-re_pretrain-small dataset for one epoch as a "baseline".
|
17 |
|
18 |
- input context length 2048
|
19 |
- output context length 512
|
20 |
+
- single tokenizer, slightly modified from modernBERT
|
21 |
+
|
22 |
+
Logs and training script can be found [on wandb](https://wandb.ai/pszemraj/enc-dec-modernbert-olmo/runs/xpg9wjco)
|
23 |
+
|
24 |
+
---
|
25 |
|
26 |
It achieves the following results on the evaluation set:
|
27 |
- Loss: 2.2113
|