Built with Axolotl

20911bf6-9c63-4262-b2cd-950d6229d81d

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1966

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.000203
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • training_steps: 483

Training results

Training Loss Epoch Step Validation Loss
No log 0.0021 1 1.8547
0.5031 0.1035 50 0.2652
0.3578 0.2070 100 0.2268
0.3121 0.3106 150 0.2232
0.4215 0.4141 200 0.2163
0.273 0.5176 250 0.2088
0.2372 0.6211 300 0.2052
0.2979 0.7246 350 0.2003
0.3152 0.8282 400 0.1977
0.3327 0.9317 450 0.1966

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
8
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.

Model tree for lesso03/20911bf6-9c63-4262-b2cd-950d6229d81d

Base model

Qwen/Qwen2.5-1.5B
Adapter
(367)
this model