Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,41 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
---
|
6 |
+
|
7 |
+
# Ettin Checkpoints
|
8 |
+
|
9 |
+
[](https://opensource.org/licenses/MIT)
|
10 |
+
[](https://arxiv.org/abs/2507.11412)
|
11 |
+
[](https://huggingface.co/jhu-clsp)
|
12 |
+
[](https://github.com/jhu-clsp/ettin-encoder-vs-decoder)
|
13 |
+
|
14 |
+
This repository contains the raw training checkpoints for the Ettin models. Each model contains a unique subdirectory, e.g. enc-150m for Ettin-Encoder-150m, with three subfolders for `decay`, `ext`, and `pretrain`.
|
15 |
+
|
16 |
+
These files work with Composer and contain all state needed to resume pre-training. Please see the [ModernBERT repository](https://github.com/AnswerDotAI/ModernBERT) for usage details.
|
17 |
+
|
18 |
+
|
19 |
+
## 🔗 Related Resources
|
20 |
+
|
21 |
+
- **Models**: [Ettin Model Suite](https://huggingface.co/collections/jhu-clsp/encoders-vs-decoders-the-ettin-suite-686303e16142257eed8e6aeb) (17M-1B parameters)
|
22 |
+
- **Phase 1**: [Pre-training Data](https://huggingface.co/datasets/jhu-clsp/ettin-pretraining-data) (1.7T tokens)
|
23 |
+
- **Phase 2**: [Mid-training Data](https://huggingface.co/datasets/jhu-clsp/ettin-extension-data) (250B tokens)
|
24 |
+
- **Phase 3**: [Decay Phase Data](https://huggingface.co/datasets/jhu-clsp/ettin-decay-data) (50B tokens)
|
25 |
+
- **Training Order**: [Batch-level Data Order](https://huggingface.co/datasets/jhu-clsp/ettin-data-order)
|
26 |
+
- **Paper**: [Arxiv link](https://arxiv.org/abs/2507.11412)
|
27 |
+
- **Code**: [GitHub Repository](https://github.com/jhu-clsp/ettin-encoder-vs-decoder)
|
28 |
+
|
29 |
+
## Citation
|
30 |
+
|
31 |
+
```bibtex
|
32 |
+
@misc{weller2025seqvsseqopen,
|
33 |
+
title={Seq vs Seq: An Open Suite of Paired Encoders and Decoders},
|
34 |
+
author={Orion Weller and Kathryn Ricci and Marc Marone and Antoine Chaffin and Dawn Lawrie and Benjamin Van Durme},
|
35 |
+
year={2025},
|
36 |
+
eprint={2507.11412},
|
37 |
+
archivePrefix={arXiv},
|
38 |
+
primaryClass={cs.CL},
|
39 |
+
url={https://arxiv.org/abs/2507.11412},
|
40 |
+
}
|
41 |
+
```
|