lapp0 commited on
Commit
7f33657
·
verified ·
1 Parent(s): 86ab274

End of training

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -102,7 +102,7 @@ Trained on 145,744,973 tokens from the [wikimedia/wikipedia](https://huggingface
102
  # Training Objective
103
 
104
  ```
105
- DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl), attn_loss_component=LossComponent(label=attn, weight=25.0, loss_fn=cos, layer_mapper=all, projector=miles))
106
  ```
107
 
108
  # Hyperparameters
@@ -119,9 +119,9 @@ The following hyperparameters were used during training:
119
  - lr_scheduler_type: `cosine_with_min_lr`
120
  - lr_scheduler_warmup_ratio: `0.5`
121
  - num_epochs: `1.0`
122
- - distillation_objective: `DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl), attn_loss_component=LossComponent(label=attn, weight=25.0, loss_fn=cos, layer_mapper=all, projector=miles))`
123
  - train_embeddings: `True`
124
- - lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x7fabf81d14e0>`
125
  - student_model_name_or_path: `None`
126
  - student_config_name_or_path: `None`
127
  - student_model_config: `None`
 
102
  # Training Objective
103
 
104
  ```
105
+ DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl), attn_loss_component=LossComponent(label=attn, weight=25.0, loss_fn=cos, layer_mapper=last_k_2, projector=miles))
106
  ```
107
 
108
  # Hyperparameters
 
119
  - lr_scheduler_type: `cosine_with_min_lr`
120
  - lr_scheduler_warmup_ratio: `0.5`
121
  - num_epochs: `1.0`
122
+ - distillation_objective: `DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl), attn_loss_component=LossComponent(label=attn, weight=25.0, loss_fn=cos, layer_mapper=last_k_2, projector=miles))`
123
  - train_embeddings: `True`
124
+ - lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x7fae0078f790>`
125
  - student_model_name_or_path: `None`
126
  - student_config_name_or_path: `None`
127
  - student_model_config: `None`