End of training
Browse files
README.md
CHANGED
|
@@ -16,13 +16,13 @@ This student model is distilled from the teacher model [gpt2](https://huggingfac
|
|
| 16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
| 17 |
|
| 18 |
It achieves the following results on the evaluation set:
|
| 19 |
-
- eval_enwikippl:
|
| 20 |
-
- eval_frwikippl:
|
| 21 |
-
- eval_zhwikippl:
|
| 22 |
-
- eval_loss:
|
| 23 |
-
- eval_runtime: 21.
|
| 24 |
-
- eval_samples_per_second: 45.
|
| 25 |
-
- eval_steps_per_second: 11.
|
| 26 |
|
| 27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 28 |
should probably proofread and complete it, then remove this comment.
|
|
@@ -65,20 +65,20 @@ Peak GPU Memory: 4.5037 GB
|
|
| 65 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
|
| 66 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
| 67 |
| **teacher eval** | | 30.2385 | 57.2728 | | | | | 18.1772 |
|
| 68 |
-
| 0 | 0 |
|
| 69 |
-
| 500 | 0.0808 |
|
| 70 |
-
| 1000 | 0.1616 |
|
| 71 |
-
| 1500 | 0.2424 |
|
| 72 |
-
| 2000 | 0.3232 |
|
| 73 |
-
| 2500 | 0.4040 |
|
| 74 |
-
| 3000 | 0.4848 |
|
| 75 |
-
| 3500 | 0.5657 |
|
| 76 |
-
| 4000 | 0.6465 |
|
| 77 |
-
| 4500 | 0.7273 |
|
| 78 |
-
| 5000 | 0.8081 |
|
| 79 |
-
| 5500 | 0.8889 |
|
| 80 |
-
| 6000 | 0.9697 |
|
| 81 |
-
| 6187 | 0.9999 |
|
| 82 |
|
| 83 |
### Framework versions
|
| 84 |
- Distily 0.2.0
|
|
|
|
| 16 |
The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
|
| 17 |
|
| 18 |
It achieves the following results on the evaluation set:
|
| 19 |
+
- eval_enwikippl: 26003.4414
|
| 20 |
+
- eval_frwikippl: 43473.625
|
| 21 |
+
- eval_zhwikippl: 54798.5430
|
| 22 |
+
- eval_loss: 21585.9199
|
| 23 |
+
- eval_runtime: 21.7886
|
| 24 |
+
- eval_samples_per_second: 45.896
|
| 25 |
+
- eval_steps_per_second: 11.474
|
| 26 |
|
| 27 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 28 |
should probably proofread and complete it, then remove this comment.
|
|
|
|
| 65 |
| step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
|
| 66 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
| 67 |
| **teacher eval** | | 30.2385 | 57.2728 | | | | | 18.1772 |
|
| 68 |
+
| 0 | 0 | 55339.3672 | 57682.5742 | 331776.0 | 21.609 | 46.277 | 11.569 | 57080.2930 |
|
| 69 |
+
| 500 | 0.0808 | 53840.9336 | 57103.8711 | 31504.6406 | 21.8206 | 45.828 | 11.457 | 60063.5586 |
|
| 70 |
+
| 1000 | 0.1616 | 46110.3789 | 54346.3320 | 25851.3926 | 21.7004 | 46.082 | 11.521 | 58033.3359 |
|
| 71 |
+
| 1500 | 0.2424 | 39930.7539 | 50785.9883 | 24363.0078 | 21.7826 | 45.908 | 11.477 | 56878.6953 |
|
| 72 |
+
| 2000 | 0.3232 | 35821.5273 | 48514.4766 | 23500.8008 | 21.6304 | 46.231 | 11.558 | 56064.2539 |
|
| 73 |
+
| 2500 | 0.4040 | 33513.9102 | 47385.3516 | 23009.5352 | 22.046 | 45.36 | 11.34 | 55873.6484 |
|
| 74 |
+
| 3000 | 0.4848 | 31516.0898 | 46269.4453 | 22568.4473 | 21.8604 | 45.745 | 11.436 | 55709.7695 |
|
| 75 |
+
| 3500 | 0.5657 | 30457.4590 | 45776.25 | 22369.2793 | 21.741 | 45.996 | 11.499 | 55598.2578 |
|
| 76 |
+
| 4000 | 0.6465 | 29546.6035 | 45307.4453 | 22169.5996 | 21.7185 | 46.044 | 11.511 | 55524.0742 |
|
| 77 |
+
| 4500 | 0.7273 | 28461.1484 | 44691.9258 | 21980.1602 | 21.6611 | 46.166 | 11.541 | 55228.2812 |
|
| 78 |
+
| 5000 | 0.8081 | 27586.4121 | 44246.7188 | 21925.6328 | 21.7331 | 46.013 | 11.503 | 55025.875 |
|
| 79 |
+
| 5500 | 0.8889 | 26811.3066 | 43867.7734 | 21713.1523 | 21.755 | 45.966 | 11.492 | 54930.3984 |
|
| 80 |
+
| 6000 | 0.9697 | 26139.0703 | 43621.0156 | 21624.0645 | 21.6556 | 46.177 | 11.544 | 54864.4336 |
|
| 81 |
+
| 6187 | 0.9999 | 26003.4414 | 43473.625 | 21585.9199 | 21.7886 | 45.896 | 11.474 | 54798.5430 |
|
| 82 |
|
| 83 |
### Framework versions
|
| 84 |
- Distily 0.2.0
|
runs/Aug10_06-50-39_93d6cbb3ad53/events.out.tfevents.1723276912.93d6cbb3ad53
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:008a85dbbd7a24fdc998c3ca660036353b33486ab3679afffa30cc2226ed79c8
|
| 3 |
+
size 249
|