See axolotl config

axolotl version: 0.4.1

adapter: lora
auto_resume_from_checkpoints: true
base_model: bigscience/bloomz-560m
bf16: auto
chat_template: llama3
dataset_prepared_path: null
dataset_processes: 6
datasets:
- data_files:
  - 31f811fb709cc914_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/31f811fb709cc914_train_data.json
  type:
    field_instruction: instruction
    field_output: response
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 3
eval_max_new_tokens: 128
eval_steps: 200
eval_table_size: null
evals_per_epoch: null
flash_attention: false
fp16: false
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 4
gradient_checkpointing: true
group_by_length: false
hub_model_id: error577/39d93b8a-13bf-40e9-8ba8-8d338e0337b1
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0002
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 16
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 8
lora_target_linear: true
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: null
micro_batch_size: 2
mlflow_experiment_name: /tmp/31f811fb709cc914_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 3
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 200
sequence_len: 512
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.005
wandb_entity: null
wandb_mode: online
wandb_name: 8a000dd1-5b3f-47db-9e70-f522ce6599ed
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 8a000dd1-5b3f-47db-9e70-f522ce6599ed
warmup_steps: 30
weight_decay: 0.0
xformers_attention: null

39d93b8a-13bf-40e9-8ba8-8d338e0337b1

This model is a fine-tuned version of bigscience/bloomz-560m on the None dataset. It achieves the following results on the evaluation set:

Loss: 1.0696

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 30
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss
9.0252	0.0002	1	2.2089
5.8772	0.0329	200	1.5151
5.6189	0.0658	400	1.4449
6.4043	0.0987	600	1.3977
5.6258	0.1315	800	1.3686
6.5765	0.1644	1000	1.3478
5.3079	0.1973	1200	1.3205
5.0516	0.2302	1400	1.3033
4.9268	0.2631	1600	1.2868
5.4649	0.2960	1800	1.2725
4.0741	0.3288	2000	1.2626
5.1813	0.3617	2200	1.2504
4.7344	0.3946	2400	1.2434
4.3783	0.4275	2600	1.2340
5.2606	0.4604	2800	1.2265
5.4076	0.4933	3000	1.2197
4.6141	0.5261	3200	1.2121
4.6438	0.5590	3400	1.2015
4.9827	0.5919	3600	1.1986
6.4062	0.6248	3800	1.1911
4.0697	0.6577	4000	1.1877
5.0534	0.6906	4200	1.1814
4.7874	0.7234	4400	1.1786
5.2285	0.7563	4600	1.1764
4.7855	0.7892	4800	1.1699
4.3143	0.8221	5000	1.1654
4.8166	0.8550	5200	1.1595
5.2696	0.8879	5400	1.1548
4.0906	0.9207	5600	1.1515
4.5442	0.9536	5800	1.1503
4.3865	0.9865	6000	1.1437
3.3439	1.0194	6200	1.1433
5.4398	1.0523	6400	1.1440
3.1569	1.0852	6600	1.1406
4.5091	1.1181	6800	1.1336
5.2349	1.1509	7000	1.1311
4.2358	1.1838	7200	1.1323
4.442	1.2167	7400	1.1288
4.3978	1.2496	7600	1.1231
3.9429	1.2825	7800	1.1220
5.2279	1.3154	8000	1.1214
4.7596	1.3482	8200	1.1181
4.8692	1.3811	8400	1.1151
4.3599	1.4140	8600	1.1113
5.431	1.4469	8800	1.1069
3.6955	1.4798	9000	1.1041
4.7102	1.5127	9200	1.1054
4.4714	1.5455	9400	1.1023
3.4939	1.5784	9600	1.1004
5.278	1.6113	9800	1.0972
3.5237	1.6442	10000	1.0961
5.3808	1.6771	10200	1.0963
4.5247	1.7100	10400	1.0937
3.4588	1.7428	10600	1.0912
4.9685	1.7757	10800	1.0906
4.4331	1.8086	11000	1.0865
4.6026	1.8415	11200	1.0863
3.8171	1.8744	11400	1.0840
3.6165	1.9073	11600	1.0831
3.7015	1.9402	11800	1.0842
4.3536	1.9730	12000	1.0788
4.0382	2.0059	12200	1.0796
4.0658	2.0388	12400	1.0780
3.0832	2.0717	12600	1.0786
4.3379	2.1046	12800	1.0747
3.4001	2.1375	13000	1.0760
3.5611	2.1703	13200	1.0739
3.4944	2.2032	13400	1.0758
3.849	2.2361	13600	1.0736
5.0364	2.2690	13800	1.0747
4.4197	2.3019	14000	1.0696
4.6541	2.3348	14200	1.0716
5.5559	2.3676	14400	1.0697
4.3708	2.4005	14600	1.0696

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

error577
/

39d93b8a-13bf-40e9-8ba8-8d338e0337b1

39d93b8a-13bf-40e9-8ba8-8d338e0337b1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for error577/39d93b8a-13bf-40e9-8ba8-8d338e0337b1

Evaluation results