lesso11's picture
End of training
62d850c verified
metadata
library_name: peft
base_model: katuni4ka/tiny-random-qwen1.5-moe
tags:
  - axolotl
  - generated_from_trainer
model-index:
  - name: 3ee349ea-a42b-4ba1-9eca-0cb04a55a667
    results: []

Built with Axolotl

3ee349ea-a42b-4ba1-9eca-0cb04a55a667

This model is a fine-tuned version of katuni4ka/tiny-random-qwen1.5-moe on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 11.7920

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.000211
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 110
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • training_steps: 500

Training results

Training Loss Epoch Step Validation Loss
No log 0.0002 1 11.9334
11.8775 0.0082 50 11.8840
11.8536 0.0163 100 11.8639
11.8233 0.0245 150 11.8314
11.8236 0.0327 200 11.8154
11.8038 0.0408 250 11.8065
11.7963 0.0490 300 11.7977
11.8009 0.0572 350 11.7945
11.7954 0.0653 400 11.7928
11.7993 0.0735 450 11.7921
11.7991 0.0817 500 11.7920

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1