pszemraj's picture
Model save
1049943 verified
|
raw
history blame
3.1 kB
metadata
library_name: transformers
license: apache-2.0
base_model: BEE-spoke-data/ModernBERT2gpt2-700m-cfg2
tags:
  - generated_from_trainer
metrics:
  - rouge
model-index:
  - name: ModernBERT2gpt2-700m-cfg2-t2t-re_pretrain-small-2048
    results: []

ModernBERT2gpt2-700m-cfg2-t2t-re_pretrain-small-2048

This model is a fine-tuned version of BEE-spoke-data/ModernBERT2gpt2-700m-cfg2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2095
  • Rouge1: 50.3518
  • Rouge2: 33.9831
  • Rougel: 46.3741
  • Rougelsum: 46.7798
  • Gen Len: 30.6
  • Num Input Tokens Seen: 515531508

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 80085
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 64
  • optimizer: Use OptimizerNames.PAGED_ADEMAMIX and the args are: No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len Input Tokens Seen
90.1752 0.0983 1000 5.6342 16.5561 3.2961 14.7126 14.7712 69.32 51291360
65.7669 0.1966 2000 4.0524 27.4318 11.4034 24.5864 24.8835 41.59 102933044
51.9327 0.2948 3000 3.2430 40.1723 21.3863 36.5277 36.8678 30.495 154351440
41.8728 0.3931 4000 2.8102 43.9268 26.793 40.1378 40.7026 30.17 205979564
41.7305 0.4914 5000 2.6100 44.4312 27.6447 40.525 40.7945 32.985 257628708
41.428 0.5897 6000 2.4841 44.7711 28.0903 40.7346 40.9658 35.03 309218384
36.5789 0.6879 7000 2.3844 44.8011 28.0367 40.8555 41.1516 30.805 360560352
36.1657 0.7862 8000 2.3185 46.647 29.8361 42.7361 43.0175 35.32 412353688
33.1455 0.8845 9000 2.2608 48.6856 32.331 44.6585 45.0587 36.3 463798308
36.9318 0.9828 10000 2.2095 50.3518 33.9831 46.3741 46.7798 30.6 515531508

Framework versions

  • Transformers 4.49.0.dev0
  • Pytorch 2.4.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0