Tohrumi commited on
Commit
1876682
·
verified ·
1 Parent(s): cbfc655

#1 Test trainer save

Browse files
Files changed (2) hide show
  1. README.md +59 -0
  2. logs.json +46 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: peft
4
+ tags:
5
+ - trl
6
+ - sft
7
+ - translation
8
+ - generated_from_trainer
9
+ base_model: mistralai/Mistral-7B-v0.1
10
+ model-index:
11
+ - name: MistralAI_iwslt15_en_vi_manual
12
+ results: []
13
+ ---
14
+
15
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
+ should probably proofread and complete it, then remove this comment. -->
17
+
18
+ # MistralAI_iwslt15_en_vi_manual
19
+
20
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
21
+
22
+ ## Model description
23
+
24
+ More information needed
25
+
26
+ ## Intended uses & limitations
27
+
28
+ More information needed
29
+
30
+ ## Training and evaluation data
31
+
32
+ More information needed
33
+
34
+ ## Training procedure
35
+
36
+ ### Training hyperparameters
37
+
38
+ The following hyperparameters were used during training:
39
+ - learning_rate: 0.002
40
+ - train_batch_size: 8
41
+ - eval_batch_size: 8
42
+ - seed: 42
43
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
44
+ - lr_scheduler_type: linear
45
+ - lr_scheduler_warmup_steps: 1
46
+ - num_epochs: 1
47
+ - mixed_precision_training: Native AMP
48
+
49
+ ### Training results
50
+
51
+
52
+
53
+ ### Framework versions
54
+
55
+ - PEFT 0.10.0
56
+ - Transformers 4.39.3
57
+ - Pytorch 2.2.1
58
+ - Datasets 2.18.0
59
+ - Tokenizers 0.15.2
logs.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "loss": 1.5437,
4
+ "grad_norm": 1.134100079536438,
5
+ "learning_rate": 0.0016129032258064516,
6
+ "epoch": 0.2,
7
+ "step": 25
8
+ },
9
+ {
10
+ "loss": 1.5328,
11
+ "grad_norm": 1.6741456985473633,
12
+ "learning_rate": 0.0012096774193548388,
13
+ "epoch": 0.4,
14
+ "step": 50
15
+ },
16
+ {
17
+ "loss": 1.443,
18
+ "grad_norm": 0.8672377467155457,
19
+ "learning_rate": 0.0008064516129032258,
20
+ "epoch": 0.6,
21
+ "step": 75
22
+ },
23
+ {
24
+ "loss": 1.46,
25
+ "grad_norm": 1.14051353931427,
26
+ "learning_rate": 0.0004032258064516129,
27
+ "epoch": 0.8,
28
+ "step": 100
29
+ },
30
+ {
31
+ "loss": 1.3072,
32
+ "grad_norm": 0.6216816902160645,
33
+ "learning_rate": 0.0,
34
+ "epoch": 1.0,
35
+ "step": 125
36
+ },
37
+ {
38
+ "train_runtime": 1303.9515,
39
+ "train_samples_per_second": 0.767,
40
+ "train_steps_per_second": 0.096,
41
+ "total_flos": 8860503771119616.0,
42
+ "train_loss": 1.4573484497070313,
43
+ "epoch": 1.0,
44
+ "step": 125
45
+ }
46
+ ]