Built with Axolotl

3e503c6c-35e3-4e36-97fd-3ca1b6490aa2

This model is a fine-tuned version of katuni4ka/tiny-random-qwen1.5-moe on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 11.7886

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.000212
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 120
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • training_steps: 500

Training results

Training Loss Epoch Step Validation Loss
No log 0.0002 1 11.9334
11.8704 0.0082 50 11.8842
11.8571 0.0163 100 11.8636
11.8176 0.0245 150 11.8292
11.8139 0.0327 200 11.8095
11.8127 0.0408 250 11.7997
11.8078 0.0490 300 11.7957
11.7966 0.0572 350 11.7917
11.795 0.0653 400 11.7895
11.7874 0.0735 450 11.7888
11.7957 0.0817 500 11.7886

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
41
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for lesso12/3e503c6c-35e3-4e36-97fd-3ca1b6490aa2

Adapter
(244)
this model