Commit
·
1638c07
1
Parent(s):
4b2429b
Update README.md
Browse files
README.md
CHANGED
@@ -8,6 +8,9 @@ This model uses a unique distillation method called ‘transformer-layer distill
|
|
8 |
This model uses 4 hidden layers with a hidden dimension size and an embedding size of 768 resulting in a total of 15M parameters. Due to the small hidden dimension size used in this model, it uses a random initialisation.
|
9 |
|
10 |
# Citation
|
|
|
|
|
|
|
11 |
```bibtex
|
12 |
@misc{https://doi.org/10.48550/arxiv.2209.03182,
|
13 |
doi = {10.48550/ARXIV.2209.03182},
|
|
|
8 |
This model uses 4 hidden layers with a hidden dimension size and an embedding size of 768 resulting in a total of 15M parameters. Due to the small hidden dimension size used in this model, it uses a random initialisation.
|
9 |
|
10 |
# Citation
|
11 |
+
|
12 |
+
If you use this model, please consider citing the following paper:
|
13 |
+
|
14 |
```bibtex
|
15 |
@misc{https://doi.org/10.48550/arxiv.2209.03182,
|
16 |
doi = {10.48550/ARXIV.2209.03182},
|