End of training
Browse files
README.md
CHANGED
@@ -78,7 +78,7 @@ GPT2LMHeadModel(
|
|
78 |
|
79 |
# Resource Usage
|
80 |
|
81 |
-
- Max Train VRAM Use:
|
82 |
- Available VRAM: 23.4329 GB
|
83 |
- GPUs:
|
84 |
- 1x NVIDIA GeForce RTX 4090
|
@@ -115,7 +115,7 @@ GPT2LMHeadModel(
|
|
115 |
<br/>
|
116 |
|
117 |
# Train Dataset
|
118 |
-
Trained on 525,
|
119 |
|
120 |
- Num Samples: `998,000`
|
121 |
- Subset: `20231101.en`
|
@@ -134,11 +134,7 @@ DistillationObjective(
|
|
134 |
weight=0
|
135 |
),
|
136 |
attn_loss_component=LossComponent(
|
137 |
-
weight=
|
138 |
-
loss_fn='raw_mse',
|
139 |
-
layer_mapper='layer-2',
|
140 |
-
norm='layernorm_teacher_only_affine',
|
141 |
-
projector='orthogonal'
|
142 |
)
|
143 |
)
|
144 |
```
|
@@ -165,14 +161,10 @@ The following hyperparameters were used during training:
|
|
165 |
weight=0
|
166 |
),
|
167 |
attn_loss_component=LossComponent(
|
168 |
-
weight=
|
169 |
-
loss_fn='raw_mse',
|
170 |
-
layer_mapper='layer-2',
|
171 |
-
norm='layernorm_teacher_only_affine',
|
172 |
-
projector='orthogonal'
|
173 |
)
|
174 |
)`
|
175 |
-
- lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at
|
176 |
- student_model_name_or_path: `None`
|
177 |
- student_config_name_or_path: `distilbert/distilgpt2`
|
178 |
- student_model_config: `None`
|
|
|
78 |
|
79 |
# Resource Usage
|
80 |
|
81 |
+
- Max Train VRAM Use: 13.7815 GB
|
82 |
- Available VRAM: 23.4329 GB
|
83 |
- GPUs:
|
84 |
- 1x NVIDIA GeForce RTX 4090
|
|
|
115 |
<br/>
|
116 |
|
117 |
# Train Dataset
|
118 |
+
Trained on 525,579,616 tokens from the [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset.
|
119 |
|
120 |
- Num Samples: `998,000`
|
121 |
- Subset: `20231101.en`
|
|
|
134 |
weight=0
|
135 |
),
|
136 |
attn_loss_component=LossComponent(
|
137 |
+
weight=0
|
|
|
|
|
|
|
|
|
138 |
)
|
139 |
)
|
140 |
```
|
|
|
161 |
weight=0
|
162 |
),
|
163 |
attn_loss_component=LossComponent(
|
164 |
+
weight=0
|
|
|
|
|
|
|
|
|
165 |
)
|
166 |
)`
|
167 |
+
- lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x710a3dbe03a0>`
|
168 |
- student_model_name_or_path: `None`
|
169 |
- student_config_name_or_path: `distilbert/distilgpt2`
|
170 |
- student_model_config: `None`
|
logs/attn_weight=0.0/events.out.tfevents.1726105148.1c1a426a2fee
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c99482e6c95932d020f3af668376ea7268eb59d8a81ae92be8f4886be03ead19
|
3 |
+
size 529
|