pszemraj commited on
Commit
398075c
·
verified ·
1 Parent(s): ea48760

Model save

Browse files
Files changed (3) hide show
  1. README.md +91 -0
  2. generation_config.json +12 -12
  3. model.safetensors +1 -1
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: pszemraj/mega-ar-525m-v0.06-fw_longish
4
+ tags:
5
+ - generated_from_trainer
6
+ metrics:
7
+ - accuracy
8
+ model-index:
9
+ - name: mega-ar-525m-v0.06-fw_longish-UltraTextbooks-2.1-fw_mix-v2
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # mega-ar-525m-v0.06-fw_longish-UltraTextbooks-2.1-fw_mix-v2
17
+
18
+ This model is a fine-tuned version of [pszemraj/mega-ar-525m-v0.06-fw_longish](https://huggingface.co/pszemraj/mega-ar-525m-v0.06-fw_longish) on an unknown dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 1.9851
21
+ - Accuracy: 0.5870
22
+
23
+ ## Model description
24
+
25
+ More information needed
26
+
27
+ ## Intended uses & limitations
28
+
29
+ More information needed
30
+
31
+ ## Training and evaluation data
32
+
33
+ More information needed
34
+
35
+ ## Training procedure
36
+
37
+ ### Training hyperparameters
38
+
39
+ The following hyperparameters were used during training:
40
+ - learning_rate: 0.000135
41
+ - train_batch_size: 1
42
+ - eval_batch_size: 1
43
+ - seed: 1608
44
+ - distributed_type: multi-GPU
45
+ - num_devices: 4
46
+ - gradient_accumulation_steps: 32
47
+ - total_train_batch_size: 128
48
+ - total_eval_batch_size: 4
49
+ - optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-08
50
+ - lr_scheduler_type: inverse_sqrt
51
+ - lr_scheduler_warmup_ratio: 0.05
52
+ - num_epochs: 1.0
53
+
54
+ ### Training results
55
+
56
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy |
57
+ |:-------------:|:------:|:-----:|:---------------:|:--------:|
58
+ | 2.1884 | 0.0375 | 400 | 2.2307 | 0.5469 |
59
+ | 2.1215 | 0.0751 | 800 | 2.1776 | 0.5555 |
60
+ | 2.1304 | 0.1126 | 1200 | 2.1419 | 0.5614 |
61
+ | 2.0552 | 0.1502 | 1600 | 2.1181 | 0.5653 |
62
+ | 2.07 | 0.1877 | 2000 | 2.1014 | 0.5679 |
63
+ | 2.0684 | 0.2252 | 2400 | 2.0869 | 0.5700 |
64
+ | 2.0446 | 0.2628 | 2800 | 2.0748 | 0.5724 |
65
+ | 2.0709 | 0.3003 | 3200 | 2.0626 | 0.5745 |
66
+ | 2.0322 | 0.3378 | 3600 | 2.0540 | 0.5756 |
67
+ | 1.9582 | 0.3754 | 4000 | 2.0474 | 0.5764 |
68
+ | 1.9826 | 0.4129 | 4400 | 2.0387 | 0.5781 |
69
+ | 1.992 | 0.4505 | 4800 | 2.0343 | 0.5789 |
70
+ | 1.9851 | 0.4880 | 5200 | 2.0278 | 0.5802 |
71
+ | 1.982 | 0.5255 | 5600 | 2.0233 | 0.5807 |
72
+ | 1.964 | 0.5631 | 6000 | 2.0193 | 0.5813 |
73
+ | 2.0033 | 0.6006 | 6400 | 2.0162 | 0.5818 |
74
+ | 1.9992 | 0.6381 | 6800 | 2.0135 | 0.5824 |
75
+ | 2.0023 | 0.6757 | 7200 | 2.0078 | 0.5833 |
76
+ | 1.977 | 0.7132 | 7600 | 2.0050 | 0.5838 |
77
+ | 1.9846 | 0.7508 | 8000 | 2.0001 | 0.5846 |
78
+ | 1.9894 | 0.7883 | 8400 | 1.9983 | 0.5850 |
79
+ | 1.9683 | 0.8258 | 8800 | 1.9948 | 0.5854 |
80
+ | 1.9349 | 0.8634 | 9200 | 1.9930 | 0.5857 |
81
+ | 1.9911 | 0.9009 | 9600 | 1.9885 | 0.5865 |
82
+ | 1.9578 | 0.9384 | 10000 | 1.9860 | 0.5870 |
83
+ | 1.9099 | 0.9760 | 10400 | 1.9851 | 0.5870 |
84
+
85
+
86
+ ### Framework versions
87
+
88
+ - Transformers 4.40.2
89
+ - Pytorch 2.3.0+cu121
90
+ - Datasets 2.19.1
91
+ - Tokenizers 0.19.1
generation_config.json CHANGED
@@ -1,13 +1,13 @@
1
  {
2
- "_from_model_config":true,
3
- "bos_token_id":128000,
4
- "eos_token_id":128001,
5
- "max_new_tokens":64,
6
- "do_sample":true,
7
- "temperature":0.8,
8
- "repetition_penalty":1.10,
9
- "no_repeat_ngram_size":4,
10
- "epsilon_cutoff":0.0006,
11
- "renormalize_logits":true,
12
- "transformers_version":"4.40.1"
13
- }
 
1
  {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 128000,
4
+ "do_sample": true,
5
+ "eos_token_id": 128001,
6
+ "epsilon_cutoff": 0.0006,
7
+ "max_new_tokens": 64,
8
+ "no_repeat_ngram_size": 4,
9
+ "renormalize_logits": true,
10
+ "repetition_penalty": 1.1,
11
+ "temperature": 0.8,
12
+ "transformers_version": "4.40.2"
13
+ }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3a6304e0b05c6aaf69d860292712ffc3cad560249764060eaae516beeb0acb5b
3
  size 2098133608
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1182fb2b1de2e10a2ff5efde8876525526e7c1cc39856f53b0a42ccd50eed9c
3
  size 2098133608