Built with Axolotl

See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: bigscience/bloomz-560m
bf16: true
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - 0fc9bc0ad0d49381_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/0fc9bc0ad0d49381_train_data.json
  type:
    field_input: input
    field_instruction: instruction
    field_output: output
    format: '{instruction} {input}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 2
eval_max_new_tokens: 128
eval_steps: 100
eval_table_size: null
flash_attention: false
fp16: null
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 8
gradient_checkpointing: true
group_by_length: false
hub_model_id: Romain-XV/d156f76d-6606-4266-96da-a418e1a226c2
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0002
load_best_model_at_end: true
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 128
lora_dropout: 0.3
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 64
lora_target_linear: true
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps: 8832
micro_batch_size: 4
mlflow_experiment_name: /tmp/0fc9bc0ad0d49381_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 2
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 100
sequence_len: 1024
strict: false
tf32: true
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
use_rslora: true
val_set_size: 0.00983438889107431
wandb_entity: null
wandb_mode: online
wandb_name: 7af274cb-a9a3-45b8-b6c3-b1c837d298d4
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 7af274cb-a9a3-45b8-b6c3-b1c837d298d4
warmup_steps: 10
weight_decay: 0.0
xformers_attention: null

d156f76d-6606-4266-96da-a418e1a226c2

This model is a fine-tuned version of bigscience/bloomz-560m on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0438

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • training_steps: 8832

Training results

Training Loss Epoch Step Validation Loss
23.4576 0.0001 1 2.8683
19.7937 0.0064 100 2.4114
20.4259 0.0127 200 2.3545
18.2863 0.0191 300 2.3214
19.4042 0.0254 400 2.2988
18.4537 0.0318 500 2.2798
15.74 0.0381 600 2.2650
18.8956 0.0445 700 2.2526
17.846 0.0509 800 2.2435
17.2099 0.0572 900 2.2312
18.1726 0.0636 1000 2.2239
17.4762 0.0699 1100 2.2156
17.8853 0.0763 1200 2.2092
18.1147 0.0826 1300 2.2030
17.74 0.0890 1400 2.1963
18.1605 0.0953 1500 2.1899
18.1325 0.1017 1600 2.1850
18.0614 0.1081 1700 2.1802
17.757 0.1144 1800 2.1778
17.6561 0.1208 1900 2.1716
18.2847 0.1271 2000 2.1688
17.6908 0.1335 2100 2.1631
18.241 0.1398 2200 2.1593
16.251 0.1462 2300 2.1577
17.7695 0.1526 2400 2.1521
17.2459 0.1589 2500 2.1489
18.3262 0.1653 2600 2.1450
16.7641 0.1716 2700 2.1425
16.4128 0.1780 2800 2.1371
15.7316 0.1843 2900 2.1358
17.9185 0.1907 3000 2.1317
16.555 0.1971 3100 2.1280
15.8804 0.2034 3200 2.1261
17.6585 0.2098 3300 2.1229
17.4634 0.2161 3400 2.1184
17.5052 0.2225 3500 2.1192
17.4755 0.2288 3600 2.1174
18.0033 0.2352 3700 2.1110
16.3309 0.2415 3800 2.1089
16.633 0.2479 3900 2.1069
17.9653 0.2543 4000 2.1034
16.6872 0.2606 4100 2.1017
16.6698 0.2670 4200 2.0987
17.0016 0.2733 4300 2.0968
17.7949 0.2797 4400 2.0948
16.2796 0.2860 4500 2.0921
17.2325 0.2924 4600 2.0895
17.4596 0.2988 4700 2.0868
17.2106 0.3051 4800 2.0839
17.0064 0.3115 4900 2.0823
15.9642 0.3178 5000 2.0800
17.6006 0.3242 5100 2.0779
17.3074 0.3305 5200 2.0746
16.0723 0.3369 5300 2.0735
16.5184 0.3433 5400 2.0711
16.2517 0.3496 5500 2.0701
17.1206 0.3560 5600 2.0683
17.2825 0.3623 5700 2.0668
16.9153 0.3687 5800 2.0644
16.2446 0.3750 5900 2.0628
15.8944 0.3814 6000 2.0610
17.7732 0.3877 6100 2.0603
17.8103 0.3941 6200 2.0587
15.7341 0.4005 6300 2.0580
15.6502 0.4068 6400 2.0557
16.8526 0.4132 6500 2.0548
17.1581 0.4195 6600 2.0530
16.0818 0.4259 6700 2.0520
15.5948 0.4322 6800 2.0514
16.6084 0.4386 6900 2.0505
16.8273 0.4450 7000 2.0496
15.6169 0.4513 7100 2.0491
18.0275 0.4577 7200 2.0479
17.1104 0.4640 7300 2.0470
17.2611 0.4704 7400 2.0465
15.66 0.4767 7500 2.0461
15.8305 0.4831 7600 2.0450
15.9643 0.4895 7700 2.0455
17.4456 0.4958 7800 2.0441
16.9549 0.5022 7900 2.0445
15.9483 0.5085 8000 2.0437
15.8849 0.5149 8100 2.0439
16.6432 0.5212 8200 2.0436
16.6686 0.5276 8300 2.0434
16.7146 0.5339 8400 2.0431
16.051 0.5403 8500 2.0432
15.9274 0.5467 8600 2.0438

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
3
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for Romain-XV/d156f76d-6606-4266-96da-a418e1a226c2

Adapter
(373)
this model