pszemraj commited on
Commit
0d68fab
·
verified ·
1 Parent(s): 1049943

End of training

Browse files
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
  library_name: transformers
 
 
3
  license: apache-2.0
4
  base_model: BEE-spoke-data/ModernBERT2gpt2-700m-cfg2
5
  tags:
@@ -16,15 +18,15 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # ModernBERT2gpt2-700m-cfg2-t2t-re_pretrain-small-2048
18
 
19
- This model is a fine-tuned version of [BEE-spoke-data/ModernBERT2gpt2-700m-cfg2](https://huggingface.co/BEE-spoke-data/ModernBERT2gpt2-700m-cfg2) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 2.2095
22
- - Rouge1: 50.3518
23
- - Rouge2: 33.9831
24
- - Rougel: 46.3741
25
- - Rougelsum: 46.7798
26
- - Gen Len: 30.6
27
- - Num Input Tokens Seen: 515531508
28
 
29
  ## Model description
30
 
 
1
  ---
2
  library_name: transformers
3
+ language:
4
+ - en
5
  license: apache-2.0
6
  base_model: BEE-spoke-data/ModernBERT2gpt2-700m-cfg2
7
  tags:
 
18
 
19
  # ModernBERT2gpt2-700m-cfg2-t2t-re_pretrain-small-2048
20
 
21
+ This model is a fine-tuned version of [BEE-spoke-data/ModernBERT2gpt2-700m-cfg2](https://huggingface.co/BEE-spoke-data/ModernBERT2gpt2-700m-cfg2) on the pszemraj/t2t-re_pretrain-small dataset.
22
  It achieves the following results on the evaluation set:
23
+ - Loss: 2.2113
24
+ - Rouge1: 48.6654
25
+ - Rouge2: 31.8667
26
+ - Rougel: 44.9897
27
+ - Rougelsum: 45.4126
28
+ - Gen Len: 30.24
29
+ - Num Input Tokens Seen: 524625736
30
 
31
  ## Model description
32
 
all_results.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 0.9999754305791012,
3
+ "eval_gen_len": 30.24,
4
+ "eval_loss": 2.2113382816314697,
5
+ "eval_rouge1": 48.6654,
6
+ "eval_rouge2": 31.8667,
7
+ "eval_rougeL": 44.9897,
8
+ "eval_rougeLsum": 45.4126,
9
+ "eval_runtime": 91.256,
10
+ "eval_samples": 200,
11
+ "eval_samples_per_second": 2.192,
12
+ "eval_steps_per_second": 0.548,
13
+ "num_input_tokens_seen": 524625736,
14
+ "predict_gen_len": 64.26315789473684,
15
+ "predict_loss": 5.781242370605469,
16
+ "predict_rouge1": 12.9534,
17
+ "predict_rouge2": 2.8458,
18
+ "predict_rougeL": 9.9173,
19
+ "predict_rougeLsum": 11.9501,
20
+ "predict_runtime": 36.282,
21
+ "predict_samples": 19,
22
+ "predict_samples_per_second": 0.524,
23
+ "predict_steps_per_second": 0.138,
24
+ "total_flos": 1.8734435060870185e+18,
25
+ "train_loss": 51.37127453965696,
26
+ "train_runtime": 54286.9149,
27
+ "train_samples": 651215,
28
+ "train_samples_per_second": 11.996,
29
+ "train_steps_per_second": 0.187,
30
+ "train_tokens_per_second": 9664.096
31
+ }
eval_results.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 0.9999754305791012,
3
+ "eval_gen_len": 30.24,
4
+ "eval_loss": 2.2113382816314697,
5
+ "eval_rouge1": 48.6654,
6
+ "eval_rouge2": 31.8667,
7
+ "eval_rougeL": 44.9897,
8
+ "eval_rougeLsum": 45.4126,
9
+ "eval_runtime": 91.256,
10
+ "eval_samples": 200,
11
+ "eval_samples_per_second": 2.192,
12
+ "eval_steps_per_second": 0.548,
13
+ "num_input_tokens_seen": 524625736
14
+ }
generated_gauntlet_predictions.txt ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ We're not a good thing, but the language of words, social science, the social science approach, social science approach, social science approach, social science, social science, social science, social science, social science, social science, social science, social science, social science, social science, decoder, decoder, decoder, document embeddings, text classification, decoder, document indexing, decoder, decoder, document indexing, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, decoder, and language
2
+ I Just want to recap quickly what is already announced to the class because we now have this beginning of room for everybody to joining persons, I'm not a number of questions, I'm not a page, I I I I'll be a.
3
+ I think we do reading as started. Thanks everybody for your patients with the hybrid learning experts. Just we face a bit of a tradeoff here.
4
+ I'm on the course of the science, Galilean challenge, Galilean challenge, I language, I language, I language, I language, I language, I language, I language, I language, I language, I language, I language, I language, I language, I language, I language, I language, I, I.
5
+ planach of language, language acquisition, language acquisition, language acquisition, language acquisition, language acquisition, language acquisition, language acquisition, language acquisition, language acquisition, language acquisition, language acquisition, language acquisition, algastic
6
+ The author of the documentary, the film noir, The Man Between, and the film's "mental disintegration" in the film noir.
7
+ The world's most famous children are in the world of world for the first time in the world of the world.
8
+ What the fuck did you just? I'll have you know I graduated top of my class in the navy seals, and I've been involved in numerous secret raids on Al-Qaeda, and I have over 300 confirmed kills. I am trained in gorilla warfare and I'm the top sniper in the entire US armed forces. you are nothing to me but just another target. I will wipe you the stuff out with precision the likes of which has never been seen before on this earth, mark my words.
9
+ Ezurich Lecture Machine Learning for Healthcare 99,604-5120-00L) Basics of ML for Medical Image Analysis Julia Vogt & Valentina Boeva & Gunnar Ratsch Institute for Machine Learning, Computer Science Department, Computer Science Department, Computer Science Department, Computer Science Department, Computer Science Department @gxr @gxr @gx/$ab # Data Data Analysis, Image classification, Image classification, Image classification, Image classification, Image classification, Active Samples, Image classification, Image classification, MRI, MRI, Active Samples, MRI, MRI, and the hippocampus, Active applications
10
+ Mathematics, unsupervised learning, and more.
11
+ MIT, research, reviews, and more notable things to know.
12
+ LEAD: AOLISE: The latest images of the world, soc, soc, and the image of images, are the subject of a new image-based model.
13
+ Cog Video: Large-scale Pretraining for Text-to-video generation, Cog Picture, trained byinheriting a pretraining text-to-image model, CogView 2,
14
+ MusIC ENHANCEMENT VIA IMAGEPAGESLATION AND VOCING, polyphonic signal enhancement, music perception, music perception, music perception, music perception
15
+ The latest news of the film's film, the film, and the film's new film, is a new chapter of the film.
16
+ The world's biggest ice-covered ice springs in the ice.
17
+ The former couple have been in the same time of the world's biggest ever.
18
+ The Bronx-area mansion is a new year of the year.
19
+ The most dangerous sight of the Caribbean sea is the most welcome he had ever seen as the beans of the Caribbean.
predict_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "predict_gen_len": 64.26315789473684,
3
+ "predict_loss": 5.781242370605469,
4
+ "predict_rouge1": 12.9534,
5
+ "predict_rouge2": 2.8458,
6
+ "predict_rougeL": 9.9173,
7
+ "predict_rougeLsum": 11.9501,
8
+ "predict_runtime": 36.282,
9
+ "predict_samples": 19,
10
+ "predict_samples_per_second": 0.524,
11
+ "predict_steps_per_second": 0.138
12
+ }
train_results.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 0.9999754305791012,
3
+ "num_input_tokens_seen": 524625736,
4
+ "total_flos": 1.8734435060870185e+18,
5
+ "train_loss": 51.37127453965696,
6
+ "train_runtime": 54286.9149,
7
+ "train_samples": 651215,
8
+ "train_samples_per_second": 11.996,
9
+ "train_steps_per_second": 0.187,
10
+ "train_tokens_per_second": 9664.096
11
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff