Delta-Vector commited on
Commit
d8fc792
·
verified ·
1 Parent(s): 1fa8d93

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -198
README.md DELETED
@@ -1,198 +0,0 @@
1
- ---
2
- library_name: peft
3
- tags:
4
- - generated_from_trainer
5
- base_model: NewEden_nemo-chatml
6
- model-index:
7
- - name: 12b-out-rslora-SE
8
- results: []
9
- ---
10
-
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
-
14
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
15
- <details><summary>See axolotl config</summary>
16
-
17
- axolotl version: `0.5.0`
18
- ```yaml
19
- ## model
20
- base_model: NewEden_nemo-chatml
21
- model_type: AutoModelForCausalLM
22
- tokenizer_type: AutoTokenizer
23
-
24
- ## qlora COPE
25
- load_in_8bit: false
26
- load_in_4bit: false
27
- strict: false
28
-
29
- ## data
30
- datasets:
31
- - path: AquaV/c2-sharegpt-advanced-prefills-filtered
32
- type: danchat-advanced
33
- - path: AquaV/c1-sharegpt-advanced-prefills-filtered
34
- type: danchat-advanced
35
- - path: AquaV/rainy-sharegpt-advanced-prefills-filtered
36
- type: danchat-advanced
37
- - path: anthracite-core/Gryphe-Opus-Charcard-Roleplay
38
- type: danchat-advanced
39
- - path: anthracite-org/kalo-opus-instruct-22k-no-refusal
40
- type: danchat-advanced
41
- - path: lodrick-the-lafted/kalo-opus-instruct-3k-filtered
42
- type: danchat-advanced
43
- - path: anthracite-org/nopm_claude_writing_fixed
44
- type: danchat-advanced
45
- - path: anthracite-org/kalo_opus_misc_240827
46
- type: danchat-advanced
47
- - path: anthracite-org/kalo_misc_part2
48
- type: danchat-advanced
49
- - path: NewEden/Claude-Instruct-2.7K
50
- type: danchat-advanced
51
- - path: NewEden/Claude-Instruct-5K
52
- type: danchat-advanced
53
- shuffle_merged_datasets: true
54
- dataset_prepared_path: dataset_prepared
55
- val_set_size: 0.02
56
- output_dir: 12b-out-rslora-SE
57
-
58
- ## LIGGER
59
- plugins:
60
- - axolotl.integrations.liger.LigerPlugin
61
- liger_rope: true
62
- liger_rms_norm: true
63
- liger_layer_norm: true
64
- liger_glu_activation: true
65
- liger_fused_linear_cross_entropy: true
66
-
67
- ## CTX settings
68
- sequence_len: 16384
69
- sample_packing: true
70
- eval_sample_packing: true
71
- pad_to_sequence_len: true
72
-
73
- ## Lora
74
- adapter: lora
75
- lora_model_dir:
76
- lora_r: 128
77
- lora_alpha: 16
78
- lora_dropout: 0.05
79
- lora_target_linear: true
80
- lora_fan_in_fan_out:
81
- peft_use_rslora: true
82
- lora_modules_to_save:
83
- - embed_tokens
84
- - lm_head
85
-
86
- ## WandB
87
- wandb_project: SE-mag-12B
88
- wandb_entity:
89
- wandb_watch:
90
- wandb_name: daring-mango
91
- wandb_log_model:
92
-
93
- ## evals
94
- evals_per_epoch: 4
95
- eval_table_size:
96
- eval_max_new_tokens: 128
97
-
98
- ## hoe params
99
- gradient_accumulation_steps: 4
100
- micro_batch_size: 1
101
- num_epochs: 2
102
- optimizer: paged_ademamix_8bit
103
- # optimizer: paged_adamw_8bit
104
- lr_scheduler: cosine
105
- learning_rate: 2.83e-5
106
-
107
- train_on_inputs: false
108
- group_by_length: false
109
- bf16: auto
110
- fp16:
111
- tf32: false
112
-
113
- gradient_checkpointing: unsloth
114
- early_stopping_patience:
115
- resume_from_checkpoint:
116
- local_rank:
117
- logging_steps: 1
118
- xformers_attention:
119
- flash_attention: true
120
- s2_attention:
121
-
122
- warmup_steps: 40
123
- saves_per_epoch: 2
124
- debug:
125
- ## for ademiamix
126
- deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16_cpuoffload_params.json
127
- ## for adamw
128
- # deepspeed: ./deepspeed_configs/zero3_bf16.json
129
- weight_decay: 0.01
130
- fsdp:
131
- fsdp_config:
132
- special_tokens:
133
- pad_token: <pad>
134
-
135
- ```
136
-
137
- </details><br>
138
-
139
- # 12b-out-rslora-SE
140
-
141
- This model was trained from scratch on the None dataset.
142
- It achieves the following results on the evaluation set:
143
- - Loss: 1.0001
144
-
145
- ## Model description
146
-
147
- More information needed
148
-
149
- ## Intended uses & limitations
150
-
151
- More information needed
152
-
153
- ## Training and evaluation data
154
-
155
- More information needed
156
-
157
- ## Training procedure
158
-
159
- ### Training hyperparameters
160
-
161
- The following hyperparameters were used during training:
162
- - learning_rate: 2.83e-05
163
- - train_batch_size: 1
164
- - eval_batch_size: 1
165
- - seed: 42
166
- - distributed_type: multi-GPU
167
- - num_devices: 4
168
- - gradient_accumulation_steps: 4
169
- - total_train_batch_size: 16
170
- - total_eval_batch_size: 4
171
- - optimizer: Use OptimizerNames.PAGED_ADEMAMIX_8BIT and the args are:
172
- No additional optimizer arguments
173
- - lr_scheduler_type: cosine
174
- - lr_scheduler_warmup_steps: 40
175
- - num_epochs: 2
176
-
177
- ### Training results
178
-
179
- | Training Loss | Epoch | Step | Validation Loss |
180
- |:-------------:|:------:|:----:|:---------------:|
181
- | 5.6935 | 0.0015 | 1 | 1.3433 |
182
- | 4.2418 | 0.2498 | 166 | 1.0758 |
183
- | 4.3565 | 0.4996 | 332 | 1.0486 |
184
- | 4.0471 | 0.7494 | 498 | 1.0297 |
185
- | 4.4088 | 0.9992 | 664 | 1.0146 |
186
- | 3.596 | 1.2494 | 830 | 1.0110 |
187
- | 3.4738 | 1.4992 | 996 | 1.0045 |
188
- | 3.6522 | 1.7491 | 1162 | 1.0010 |
189
- | 3.2321 | 1.9989 | 1328 | 1.0001 |
190
-
191
-
192
- ### Framework versions
193
-
194
- - PEFT 0.13.2
195
- - Transformers 4.46.1
196
- - Pytorch 2.3.1+cu121
197
- - Datasets 3.0.1
198
- - Tokenizers 0.20.3