Built with Axolotl

75934fce-c53d-4ed4-a8b8-ce30644d4408

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1964

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.000208
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • training_steps: 483

Training results

Training Loss Epoch Step Validation Loss
No log 0.0021 1 1.8547
0.5027 0.1035 50 0.2691
0.3573 0.2070 100 0.2200
0.3111 0.3106 150 0.2220
0.4188 0.4141 200 0.2153
0.2728 0.5176 250 0.2107
0.2365 0.6211 300 0.2101
0.2971 0.7246 350 0.2001
0.3151 0.8282 400 0.1974
0.3327 0.9317 450 0.1964

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
9
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for lesso08/75934fce-c53d-4ed4-a8b8-ce30644d4408

Base model

Qwen/Qwen2.5-1.5B
Adapter
(368)
this model