File size: 4,331 Bytes
c73f2d5
 
 
 
 
 
 
 
 
 
 
 
 
 
c7236a6
 
 
c73f2d5
0543b31
 
 
c73f2d5
 
 
 
984059a
c73f2d5
 
 
5577a0b
 
 
c73f2d5
 
 
 
 
984059a
 
c73f2d5
 
 
984059a
c73f2d5
 
 
984059a
 
c73f2d5
 
 
984059a
3fb8109
984059a
 
c73f2d5
 
 
 
984059a
c73f2d5
984059a
 
c73f2d5
 
 
984059a
 
 
 
 
 
 
f4208c5
c73f2d5
984059a
3fb8109
984059a
 
c73f2d5
f4208c5
c73f2d5
 
984059a
c73f2d5
984059a
 
c73f2d5
 
 
f4208c5
c73f2d5
984059a
c73f2d5
984059a
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: apache-2.0
datasets:
- huihui-ai/QWQ-LONGCOT-500K
- huihui-ai/LONGCOT-Refine-500K
base_model:
- huihui-ai/Llama-3.2-3B-Instruct-abliterated
---
# MicroThinker-1B-Preview

MicroThinker-1B-Preview, a new model fine-tuned from the [huihui-ai/Llama-3.2-3B-Instruct-abliterated](https://huggingface.co/huihui-ai/Llama-3.2-3B-Instruct-abliterated) model. 

## Training Details

This is just a test, but the performance is quite good. The model is still being fine-tuned, but it will be ready very soon.

Now, I'll introduce the test environment.

The model was trained using 1 RTX 4090 GPU(24GB) .

The fine-tuning process used only 20,000 records from each dataset.

The [SFT (Supervised Fine-Tuning)](https://github.com/modelscope/ms-swift) process is divided into several steps, and no code needs to be written.
1. Create the environment.

```
conda create -yn ms-swift python=3.11
conda activate ms-swift

mkdir MicroThinker-1B-Preview
cd MicroThinker-1B-Preview

git clone https://github.com/modelscope/ms-swift.git

cd ms-swift
pip install -e .
cd ..
```


2. Download the model and dataset.

```
huggingface-cli download huihui-ai/Llama-3.2-1B-Instruct-abliterated --local-dir ./huihui-ai/Llama-3.2-1B-Instruct-abliterated
huggingface-cli download --repo-type  dataset huihui-ai/QWQ-LONGCOT-500K --local-dir ./data/QWQ-LONGCOT-500K
huggingface-cli download --repo-type  dataset huihui-ai/LONGCOT-Refine-500K --local-dir ./data/LONGCOT-Refine-500K
```


3. Used only the huihui-ai/QWQ-LONGCOT-500K dataset (#20000), Trained for 1 epoch:

```
swift sft --model huihui-ai/Llama-3.2-1B-Instruct-abliterated --model_type llama3_2 --train_type lora --dataset "data/QWQ-LONGCOT-500K/qwq_500k.jsonl#20000" --torch_dtype bfloat16 --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --learning_rate 1e-4 --lora_rank 8 --lora_alpha 32 --target_modules all-linear --gradient_accumulation_steps 16 --eval_steps 50 --save_steps 50 --save_total_limit 2 --logging_steps 5  --max_length 16384  --output_dir output/Llama-3.2-1B-Instruct-abliterated/lora/sft --system "You are a helpful assistant. You should think step-by-step." --warmup_ratio 0.05 --dataloader_num_workers 4 --model_author "huihui-ai" --model_name "huihui-ai-robot"
```


4. Save the fine-tuned model.
Replace the directories below with specific ones.

```
swift infer --model huihui-ai/Llama-3.2-1B-Instruct-abliterated --adapters output/Llama-3.2-1B-Instruct-abliterated/lora/sft/v0-20250102-153619/checkpoint-1237 --merge_lora true 
```


This should create a new model directory: `checkpoint-1237-merged`, Copy or move this directory to the `huihui` directory.

5. Perform inference on the fine-tuned model.

```
swift infer --model huihui/checkpoint-1237-merged --stream true --infer_backend pt --max_new_tokens 8192
```


6. Combined training with huihui-ai/QWQ-LONGCOT-500K (#20000) and huihui-ai/LONGCOT-Refine datasets (#20000), Trained for 1 epoch:

```
swift sft --model huihui-ai/checkpoint-1237-merged --model_type llama3_2 --train_type lora --dataset "data/QWQ-LONGCOT-500K/qwq_500k.jsonl#20000" "data/LONGCOT-Refine-500K/refine_from_qwen2_5.jsonl#20000" --torch_dtype bfloat16 --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --learning_rate 1e-4 --lora_rank 8 --lora_alpha 32 --target_modules all-linear --gradient_accumulation_steps 16 --eval_steps 50 --save_steps 50 --save_total_limit 2 --logging_steps 5  --max_length 16384  --output_dir output/Llama-3.2-1B-Instruct-abliterated/lora/sft2 --system "You are a helpful assistant. You should think step-by-step." --warmup_ratio 0.05 --dataloader_num_workers 4 --model_author "huihui-ai" --model_name "huihui-ai-robot"
```


7. Save the final fine-tuned model.
Replace the directories below with specific ones.

```
swift infer --model huihui-ai/checkpoint-1237-merged --adapters output/Llama-3.2-1B-Instruct-abliterated/lora/sft2/v0-20250103-121319/checkpoint-1237 --merge_lora true 
```


This should create a new model directory: `checkpoint-1237-merged`, Rename the directory to `MicroThinker-1B-Preview`, Copy or move this directory to the `huihui` directory.

8. Perform inference on the final fine-tuned model.

```
swift infer --model huihui/MicroThinker-1B-Preview --stream true --infer_backend pt --max_new_tokens 8192
```