SmallThinker-3B-preview

We introduce SmallThinker-3B-preview, a new model fine-tuned from the Qwen2.5-3b-Instruct model.

Now you can directly deploy SmallThinker On your phones with PowerServe.

Benchmark Performance

Model AIME24 AMC23 GAOKAO2024_I GAOKAO2024_II MMLU_STEM AMPS_Hard math_comp
Qwen2.5-3B-Instruct 6.67 45 50 35.8 59.8 - -
SmallThinker 16.667 57.5 64.2 57.1 68.2 70 46.8
GPT-4o 9.3 - - - 64.2 57 50

Limitation: Due to SmallThinker's current limitations in instruction following, for math_comp we adopt a more lenient evaluation method where only correct answers are required, without constraining responses to follow the specified AAAAA format.

Colab Link: Colab

Intended Use Cases

SmallThinker is designed for the following use cases:

  1. Edge Deployment: Its small size makes it ideal for deployment on resource-constrained devices.
  2. Draft Model for QwQ-32B-Preview: SmallThinker can serve as a fast and efficient draft model for the larger QwQ-32B-Preview model. From my test, in llama.cpp we can get 70% speedup (from 40 tokens/s to 70 tokens/s).

Training Details

The model was trained using 8 H100 GPUs with a global batch size of 16. The specific configuration is as follows:

The SFT (Supervised Fine-Tuning) process was conducted in two phases:

  1. First Phase:
    • Used only the PowerInfer/QWQ-LONGCOT-500K dataset
    • Trained for 1.5 epochs
### model
model_name_or_path: /home/syx/Qwen2.5-3B-Instruct

### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json

### dataset
dataset: o1-v2
template: qwen
neat_packing: true
cutoff_len: 16384
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/qwen2-01-qat/full/sft
logging_steps: 1
save_steps: 1000
plot_loss: true
overwrite_output_dir: true
  1. Second Phase:
    • Combined training with PowerInfer/QWQ-LONGCOT-500K and PowerInfer/LONGCOT-Refine datasets
    • Continued training for 2 additional epochs
### model
model_name_or_path: saves/qwen2-01-qat/full/sft/checkpoint-24000

### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json

### dataset
dataset: o1-v2, o1-v3
template: qwen
neat_packing: true
cutoff_len: 16384
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/qwen2-01-qat/full/sft
logging_steps: 1
save_steps: 1000
plot_loss: true
overwrite_output_dir: true

Limitations & Disclaimer

Please be aware of the following limitations:

  • Language Limitation: The model has only been trained on English-language datasets, hence its capabilities in other languages are still lacking.
  • Limited Knowledge: Due to limited SFT data and the model's relatively small scale, its reasoning capabilities are constrained by its knowledge base.
  • Unpredictable Outputs: The model may produce unexpected outputs due to its size and probabilistic generation paradigm. Users should exercise caution and validate the model's responses.
  • Repetition Issue: The model tends to repeat itself when answering high-difficulty questions. Please increase the repetition_penalty to mitigate this issue.
Downloads last month
110,032
Safetensors
Model size
3.4B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for PowerInfer/SmallThinker-3B-Preview

Base model

Qwen/Qwen2.5-3B
Finetuned
(86)
this model
Finetunes
4 models
Merges
7 models
Quantizations
21 models

Datasets used to train PowerInfer/SmallThinker-3B-Preview

Spaces using PowerInfer/SmallThinker-3B-Preview 13