Undi95 commited on
Commit
097dea8
·
verified ·
1 Parent(s): aa3084f

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - generated_from_trainer
5
+ datasets:
6
+ - 2025-01_conversations_truncated.jsonl
7
+ model-index:
8
+ - name: outputs/
9
+ results: []
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
16
+ <details><summary>See axolotl config</summary>
17
+
18
+ axolotl version: `0.6.0`
19
+ ```yaml
20
+ base_model: ./meta-llama_Llama-3.2-3B
21
+ # optionally might have model_type or tokenizer_type
22
+ model_type: LlamaForCausalLM
23
+ tokenizer_type: AutoTokenizer
24
+ # Automatically upload checkpoint and final model to HF
25
+ # hub_model_id: username/custom_model_name
26
+
27
+ load_in_8bit: false
28
+ load_in_4bit: false
29
+ strict: false
30
+
31
+ datasets:
32
+ - path: 2025-01_conversations_truncated.jsonl
33
+ type: chat_template
34
+ chat_template: llama3
35
+ field_messages: conversations
36
+ message_field_role: from
37
+ message_field_content: value
38
+ roles:
39
+ user:
40
+ - human
41
+ assistant:
42
+ - gpt
43
+ system:
44
+ - system
45
+ dataset_prepared_path:
46
+ val_set_size: 0.05
47
+ output_dir: ./outputs/
48
+ dataset_prepared_path: last_run_prepared
49
+
50
+ sequence_len: 4096
51
+ eval_sample_packing: false
52
+ sample_packing: true
53
+ pad_to_sequence_len: true
54
+
55
+ wandb_project: JVCGPT Light 3b base
56
+ wandb_entity:
57
+ wandb_watch:
58
+ wandb_name:
59
+ wandb_log_model:
60
+
61
+ gradient_accumulation_steps: 4
62
+ micro_batch_size: 2
63
+ num_epochs: 4
64
+ optimizer: paged_adamw_8bit
65
+ lr_scheduler: cosine
66
+ learning_rate: 0.000007
67
+
68
+ train_on_inputs: true
69
+ group_by_length: false
70
+ bf16: auto
71
+ fp16:
72
+ tf32: false
73
+
74
+ gradient_checkpointing: unsloth
75
+ early_stopping_patience:
76
+ resume_from_checkpoint:
77
+ local_rank:
78
+ logging_steps: 1
79
+ xformers_attention:
80
+ flash_attention: true
81
+ s2_attention:
82
+
83
+ warmup_steps: 100
84
+ eval_table_size:
85
+ saves_per_epoch: 2
86
+ debug:
87
+ deepspeed:
88
+ weight_decay: 0.0
89
+ fsdp:
90
+ fsdp_config:
91
+ special_tokens:
92
+ pad_token: <|end_of_text|>
93
+ save_safetensors: true
94
+ save_total_limit: 10
95
+ ```
96
+
97
+ </details><br>
98
+
99
+ # outputs/
100
+
101
+ This model was trained from scratch on the 2025-01_conversations_truncated.jsonl dataset.
102
+ It achieves the following results on the evaluation set:
103
+ - Loss: 1.1520
104
+
105
+ ## Model description
106
+
107
+ More information needed
108
+
109
+ ## Intended uses & limitations
110
+
111
+ More information needed
112
+
113
+ ## Training and evaluation data
114
+
115
+ More information needed
116
+
117
+ ## Training procedure
118
+
119
+ ### Training hyperparameters
120
+
121
+ The following hyperparameters were used during training:
122
+ - learning_rate: 7e-06
123
+ - train_batch_size: 2
124
+ - eval_batch_size: 2
125
+ - seed: 42
126
+ - distributed_type: multi-GPU
127
+ - num_devices: 4
128
+ - gradient_accumulation_steps: 4
129
+ - total_train_batch_size: 32
130
+ - total_eval_batch_size: 8
131
+ - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
132
+ - lr_scheduler_type: cosine
133
+ - lr_scheduler_warmup_steps: 100
134
+ - num_epochs: 4
135
+
136
+ ### Training results
137
+
138
+ | Training Loss | Epoch | Step | Validation Loss |
139
+ |:-------------:|:------:|:----:|:---------------:|
140
+ | 0.6055 | 1.0006 | 789 | 1.1893 |
141
+ | 0.5619 | 2.0006 | 1578 | 1.1576 |
142
+ | 0.4873 | 3.0006 | 2367 | 1.1522 |
143
+ | 1.2133 | 3.9917 | 3148 | 1.1520 |
144
+
145
+
146
+ ### Framework versions
147
+
148
+ - Transformers 4.47.1
149
+ - Pytorch 2.5.1+cu124
150
+ - Datasets 3.2.0
151
+ - Tokenizers 0.21.0
model-00001-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b2e3f36e54ba8ad67dff82546b6c26ef4ce154c3e0cf26118eb240e0f7261bbb
3
  size 4965799096
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52204f26c5f5cd238eb672201051abc1a7a7b7faffd24921cf96019d540a7df7
3
  size 4965799096
model-00002-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4ca83797f0bb9e75400adeb2e7b7de1803782e09f9c5a3b8de9a9b6bf0dcb15a
3
  size 1459729952
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:99ffd4500563733046906713c6c0be4e90ac419e77e1d43e0b4f334e93d215d2
3
  size 1459729952