error577 commited on
Commit
ac2cfc4
·
verified ·
1 Parent(s): d65d56f

End of training

Browse files
README.md CHANGED
@@ -41,11 +41,11 @@ early_stopping_patience: null
41
  eval_max_new_tokens: 128
42
  eval_table_size: null
43
  evals_per_epoch: 1
44
- flash_attention: false
45
  fp16: null
46
  fsdp: null
47
  fsdp_config: null
48
- gradient_accumulation_steps: 8
49
  gradient_checkpointing: true
50
  group_by_length: false
51
  hub_model_id: error577/202543ff-c8cb-4cd4-a8ca-da56264dae6e
@@ -82,7 +82,7 @@ tf32: false
82
  tokenizer_type: AutoTokenizer
83
  train_on_inputs: false
84
  trust_remote_code: true
85
- val_set_size: 0.02
86
  wandb_entity: null
87
  wandb_mode: online
88
  wandb_name: a79dc58a-b30f-4b42-a703-e31cc51332ed
@@ -101,7 +101,7 @@ xformers_attention: null
101
 
102
  This model is a fine-tuned version of [oopsung/llama2-7b-koNqa-test-v1](https://huggingface.co/oopsung/llama2-7b-koNqa-test-v1) on the None dataset.
103
  It achieves the following results on the evaluation set:
104
- - Loss: nan
105
 
106
  ## Model description
107
 
@@ -124,8 +124,8 @@ The following hyperparameters were used during training:
124
  - train_batch_size: 1
125
  - eval_batch_size: 1
126
  - seed: 42
127
- - gradient_accumulation_steps: 8
128
- - total_train_batch_size: 8
129
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
130
  - lr_scheduler_type: cosine
131
  - lr_scheduler_warmup_steps: 10
@@ -135,7 +135,7 @@ The following hyperparameters were used during training:
135
 
136
  | Training Loss | Epoch | Step | Validation Loss |
137
  |:-------------:|:------:|:----:|:---------------:|
138
- | 0.0 | 0.0020 | 100 | nan |
139
 
140
 
141
  ### Framework versions
 
41
  eval_max_new_tokens: 128
42
  eval_table_size: null
43
  evals_per_epoch: 1
44
+ flash_attention: true
45
  fp16: null
46
  fsdp: null
47
  fsdp_config: null
48
+ gradient_accumulation_steps: 16
49
  gradient_checkpointing: true
50
  group_by_length: false
51
  hub_model_id: error577/202543ff-c8cb-4cd4-a8ca-da56264dae6e
 
82
  tokenizer_type: AutoTokenizer
83
  train_on_inputs: false
84
  trust_remote_code: true
85
+ val_set_size: 0.001
86
  wandb_entity: null
87
  wandb_mode: online
88
  wandb_name: a79dc58a-b30f-4b42-a703-e31cc51332ed
 
101
 
102
  This model is a fine-tuned version of [oopsung/llama2-7b-koNqa-test-v1](https://huggingface.co/oopsung/llama2-7b-koNqa-test-v1) on the None dataset.
103
  It achieves the following results on the evaluation set:
104
+ - Loss: 2.1406
105
 
106
  ## Model description
107
 
 
124
  - train_batch_size: 1
125
  - eval_batch_size: 1
126
  - seed: 42
127
+ - gradient_accumulation_steps: 16
128
+ - total_train_batch_size: 16
129
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
130
  - lr_scheduler_type: cosine
131
  - lr_scheduler_warmup_steps: 10
 
135
 
136
  | Training Loss | Epoch | Step | Validation Loss |
137
  |:-------------:|:------:|:----:|:---------------:|
138
+ | 1.4357 | 0.0038 | 100 | 2.1406 |
139
 
140
 
141
  ### Framework versions
adapter_config.json CHANGED
@@ -20,13 +20,13 @@
20
  "rank_pattern": {},
21
  "revision": null,
22
  "target_modules": [
 
 
23
  "v_proj",
24
  "up_proj",
25
- "gate_proj",
26
  "q_proj",
27
- "o_proj",
28
- "k_proj",
29
- "down_proj"
30
  ],
31
  "task_type": "CAUSAL_LM",
32
  "use_dora": false,
 
20
  "rank_pattern": {},
21
  "revision": null,
22
  "target_modules": [
23
+ "o_proj",
24
+ "gate_proj",
25
  "v_proj",
26
  "up_proj",
27
+ "down_proj",
28
  "q_proj",
29
+ "k_proj"
 
 
30
  ],
31
  "task_type": "CAUSAL_LM",
32
  "use_dora": false,
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:448d79c355f7f154f10031130f3112521c83938fb84dc4535dfc7ca4b6cbd6fe
3
  size 319977674
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1793593aee751493d462b407e8dd4822d17c993111b28bec02f359232cdf6e82
3
  size 319977674
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9d8dc13d2518bb660921f4bd58e81b17ace1239b9852e7f4e49ddd8315091788
3
  size 319876032
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f6941aad94e301671ce4ec56f0ccceede6972fcaf8c253d4caa13024003889a
3
  size 319876032
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7ce62b2704cdc4ed29b6f2c9d03327e9b86048a8213f6988a75495efc7e32ae0
3
  size 6776
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a28dc31871e5040bd4850398ba09ae43aa80a5ef753a46b4245c30c1e1a1daac
3
  size 6776