flan-context

This model is a fine-tuned version of google/flan-t5-base on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.7001
  • Rouge: {'rouge1': 0.232278966116149, 'rouge2': 0.07728171691786677, 'rougeL': 0.19623743124239312, 'rougeLsum': 0.19655898868255733}
  • Bleu: {'bleu': 0.028838462454173944, 'precisions': [0.3551912568306011, 0.11158798283261803, 0.052838857893009517, 0.027189265536723163], 'brevity_penalty': 0.33198193653070995, 'length_ratio': 0.4755847353303242, 'translation_length': 3477, 'reference_length': 7311}
  • Bertscore Precision: 0.8829
  • Bertscore Recall: 0.8627
  • Bertscore F1: 0.8726
  • Meteor: 0.1620

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Rouge Bleu Bertscore Precision Bertscore Recall Bertscore F1 Meteor
2.8853 0.9973 188 2.4748 {'rouge1': 0.21478039810187508, 'rouge2': 0.07429429269352927, 'rougeL': 0.18676797323655045, 'rougeLsum': 0.18706967047262657} {'bleu': 0.02063755745745233, 'precisions': [0.38532423208191124, 0.12154696132596685, 0.0556, 0.027571115973741796], 'brevity_penalty': 0.22419886226426375, 'length_ratio': 0.4007659690876761, 'translation_length': 2930, 'reference_length': 7311} 0.8914 0.8594 0.8749 0.1479
2.5541 2.0 377 2.4110 {'rouge1': 0.22948564713571, 'rouge2': 0.07746423405162098, 'rougeL': 0.19713899099706828, 'rougeLsum': 0.19729876653861086} {'bleu': 0.025493859979390892, 'precisions': [0.3697632058287796, 0.1162715167262098, 0.05027932960893855, 0.025670064175160438], 'brevity_penalty': 0.29538077809069857, 'length_ratio': 0.4505539597866229, 'translation_length': 3294, 'reference_length': 7311} 0.8866 0.8605 0.8733 0.1571
2.3627 2.9973 565 2.3796 {'rouge1': 0.23033609170894062, 'rouge2': 0.07935460715728626, 'rougeL': 0.20056333294966316, 'rougeLsum': 0.20035996272531867} {'bleu': 0.025824376518710283, 'precisions': [0.36265060240963853, 0.11207729468599034, 0.05051903114186851, 0.026542056074766354], 'brevity_penalty': 0.3005598328831212, 'length_ratio': 0.45411024483654766, 'translation_length': 3320, 'reference_length': 7311} 0.8858 0.8609 0.8730 0.1614
2.1889 4.0 754 2.3669 {'rouge1': 0.2197965347814989, 'rouge2': 0.07251756269788934, 'rougeL': 0.18818025331932353, 'rougeLsum': 0.18851309057084853} {'bleu': 0.025723417721010913, 'precisions': [0.3480581085087459, 0.1032298923369221, 0.04858987427794767, 0.026759530791788857], 'brevity_penalty': 0.31114175380888076, 'length_ratio': 0.4613595951306251, 'translation_length': 3373, 'reference_length': 7311} 0.8828 0.8598 0.8710 0.1487
2.0543 4.9973 942 2.3862 {'rouge1': 0.22612840879892715, 'rouge2': 0.07607082590470542, 'rougeL': 0.19461687796155736, 'rougeLsum': 0.1946541956563201} {'bleu': 0.028827495157242068, 'precisions': [0.3505929997107318, 0.10919185687847008, 0.05219689461513049, 0.029871977240398292], 'brevity_penalty': 0.32796792789857493, 'length_ratio': 0.4728491314457667, 'translation_length': 3457, 'reference_length': 7311} 0.8823 0.8609 0.8714 0.1555
1.9291 6.0 1131 2.3932 {'rouge1': 0.2287665228610908, 'rouge2': 0.07973426705005476, 'rougeL': 0.1988426370352011, 'rougeLsum': 0.19879253830230947} {'bleu': 0.02894312079722988, 'precisions': [0.3430804863925883, 0.1105279407224452, 0.0539021164021164, 0.0298932384341637], 'brevity_penalty': 0.32736604931043434, 'length_ratio': 0.47243879086308305, 'translation_length': 3454, 'reference_length': 7311} 0.8816 0.8608 0.8710 0.1550
1.8217 6.9973 1319 2.4167 {'rouge1': 0.2390529160325309, 'rouge2': 0.07957708305532896, 'rougeL': 0.2027395994404016, 'rougeLsum': 0.20329200198212954} {'bleu': 0.028281206953621237, 'precisions': [0.3607173850159098, 0.11227637260950031, 0.05120581433762801, 0.026661926768574477], 'brevity_penalty': 0.32796792789857493, 'length_ratio': 0.4728491314457667, 'translation_length': 3457, 'reference_length': 7311} 0.8835 0.8614 0.8722 0.1604
1.7252 8.0 1508 2.4362 {'rouge1': 0.2387012102229664, 'rouge2': 0.08568113664238328, 'rougeL': 0.2052351585811596, 'rougeLsum': 0.20522739929467346} {'bleu': 0.031214398462396122, 'precisions': [0.35888601780074647, 0.11872705018359853, 0.05830330822142155, 0.031007751937984496], 'brevity_penalty': 0.33318661488396756, 'length_ratio': 0.4764054164956914, 'translation_length': 3483, 'reference_length': 7311} 0.8839 0.8625 0.8730 0.1655
1.6454 8.9973 1696 2.4714 {'rouge1': 0.23254921528038897, 'rouge2': 0.08299424380944474, 'rougeL': 0.19959147297590052, 'rougeLsum': 0.1998292546074823} {'bleu': 0.030561315156201488, 'precisions': [0.35718374356038923, 0.11466910643488869, 0.05515665796344647, 0.03051560855840056], 'brevity_penalty': 0.3353957201733691, 'length_ratio': 0.4779099986321981, 'translation_length': 3494, 'reference_length': 7311} 0.8842 0.8631 0.8734 0.1637
1.5742 10.0 1885 2.4923 {'rouge1': 0.2320880017678728, 'rouge2': 0.08076103877449484, 'rougeL': 0.20075533361867798, 'rougeLsum': 0.2009244163684044} {'bleu': 0.029328139847920456, 'precisions': [0.34839816933638446, 0.11063700091435538, 0.053816046966731895, 0.028050490883590462], 'brevity_penalty': 0.3357974461669614, 'length_ratio': 0.4781835590206538, 'translation_length': 3496, 'reference_length': 7311} 0.8821 0.8614 0.8715 0.1566
1.5129 10.9973 2073 2.5331 {'rouge1': 0.22819267589219194, 'rouge2': 0.07708184976753998, 'rougeL': 0.19398674728487852, 'rougeLsum': 0.1946429343585785} {'bleu': 0.02980684242683419, 'precisions': [0.34710743801652894, 0.1071645415907711, 0.05326404676843131, 0.03037709497206704], 'brevity_penalty': 0.33840916356368883, 'length_ratio': 0.4799617015456162, 'translation_length': 3509, 'reference_length': 7311} 0.8814 0.8620 0.8715 0.1586
1.4482 12.0 2262 2.5347 {'rouge1': 0.23553666134633247, 'rouge2': 0.08324143955533186, 'rougeL': 0.20150057449829922, 'rougeLsum': 0.20162040981968193} {'bleu': 0.03055727708520387, 'precisions': [0.35581061692969873, 0.11406727828746177, 0.05662847790507365, 0.03063380281690141], 'brevity_penalty': 0.3335882204056274, 'length_ratio': 0.4766789768841472, 'translation_length': 3485, 'reference_length': 7311} 0.8838 0.8631 0.8732 0.1621
1.4032 12.9973 2450 2.5527 {'rouge1': 0.2330720183158893, 'rouge2': 0.08201299693976438, 'rougeL': 0.19804789815084495, 'rougeLsum': 0.1981903289043609} {'bleu': 0.030383667999639367, 'precisions': [0.35192639447958596, 0.11094085197670855, 0.05643044619422572, 0.03176844334627603], 'brevity_penalty': 0.33218270164741553, 'length_ratio': 0.47572151552455205, 'translation_length': 3478, 'reference_length': 7311} 0.8830 0.8623 0.8724 0.1631
1.349 14.0 2639 2.6143 {'rouge1': 0.23254175207188804, 'rouge2': 0.08024193515895395, 'rougeL': 0.19544835613960485, 'rougeLsum': 0.19569878335291974} {'bleu': 0.029345050805374768, 'precisions': [0.3517616728731023, 0.11141636141636142, 0.053577262332571055, 0.028109627547434995], 'brevity_penalty': 0.3347931711298346, 'length_ratio': 0.47749965804951444, 'translation_length': 3491, 'reference_length': 7311} 0.8838 0.8626 0.8729 0.1609
1.3214 14.9973 2827 2.6210 {'rouge1': 0.23260158751663262, 'rouge2': 0.08252522284643907, 'rougeL': 0.19522688167242186, 'rougeLsum': 0.19557179409288622} {'bleu': 0.03067670298166347, 'precisions': [0.350597609561753, 0.11124583207032435, 0.05642023346303502, 0.030324154757755316], 'brevity_penalty': 0.33941388966452746, 'length_ratio': 0.4806456025167556, 'translation_length': 3514, 'reference_length': 7311} 0.8829 0.8626 0.8725 0.1635
1.2863 16.0 3016 2.6274 {'rouge1': 0.23589678065489733, 'rouge2': 0.08083116605036103, 'rougeL': 0.19887788274810117, 'rougeLsum': 0.19949881558390142} {'bleu': 0.0291217315778175, 'precisions': [0.35124321234638467, 0.11297198538367845, 0.05246008471814923, 0.02697967764540995], 'brevity_penalty': 0.3364000743888085, 'length_ratio': 0.4785938996033374, 'translation_length': 3499, 'reference_length': 7311} 0.8829 0.8623 0.8724 0.1635
1.259 16.9973 3204 2.6495 {'rouge1': 0.2295081896719511, 'rouge2': 0.0737752133374778, 'rougeL': 0.19273858639259714, 'rougeLsum': 0.19259899276919815} {'bleu': 0.025280028083103128, 'precisions': [0.34685598377281945, 0.103831891223733, 0.045018205892088714, 0.022095509622238062], 'brevity_penalty': 0.3267642317139601, 'length_ratio': 0.4720284502803994, 'translation_length': 3451, 'reference_length': 7311} 0.8818 0.8616 0.8714 0.1590
1.2333 18.0 3393 2.6829 {'rouge1': 0.23054467803781714, 'rouge2': 0.07559216640427756, 'rougeL': 0.19491526135602003, 'rougeLsum': 0.19516429845632233} {'bleu': 0.028049683046042627, 'precisions': [0.35005733944954126, 0.10815765352887259, 0.050359712230215826, 0.026028842771720014], 'brevity_penalty': 0.3341906708834706, 'length_ratio': 0.4770893174668308, 'translation_length': 3488, 'reference_length': 7311} 0.8821 0.8621 0.8718 0.1600
1.2287 18.9973 3581 2.6920 {'rouge1': 0.229205558343422, 'rouge2': 0.07654852833094582, 'rougeL': 0.19552872565004628, 'rougeLsum': 0.1954436046874195} {'bleu': 0.028600350734863153, 'precisions': [0.3502873563218391, 0.11026033690658499, 0.05213114754098361, 0.027160493827160494], 'brevity_penalty': 0.3325842495204903, 'length_ratio': 0.47599507591300777, 'translation_length': 3480, 'reference_length': 7311} 0.8826 0.8625 0.8723 0.1607
1.2018 19.9469 3760 2.7001 {'rouge1': 0.232278966116149, 'rouge2': 0.07728171691786677, 'rougeL': 0.19623743124239312, 'rougeLsum': 0.19655898868255733} {'bleu': 0.028838462454173944, 'precisions': [0.3551912568306011, 0.11158798283261803, 0.052838857893009517, 0.027189265536723163], 'brevity_penalty': 0.33198193653070995, 'length_ratio': 0.4755847353303242, 'translation_length': 3477, 'reference_length': 7311} 0.8829 0.8627 0.8726 0.1620

Framework versions

  • Transformers 4.46.3
  • Pytorch 2.4.1+cu118
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
43
Safetensors
Model size
248M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for zera09/flan-context

Finetuned
(675)
this model