thinker-1

obviously this isnt a proper model. if all goes well, ill make a better model

test qwen3-4b finetune. (trained on 480~ samples) outputs are structured, with tool calls inside the reasoning chain itself. steps are intended to replicate the openai steps-like responses from o1 earlier this year.

phases:

think token output
planning
steps based off of planning
tool call if needed
respond

no tokens were intentionally added, all of it learnt.

'added' tokens are:

<planning>
</plannning>
<step num=int>
</step>
<tool name=string>
</tool>
<tool_response>
</tool_response>

suggestions

prefill the planning token.

sample output

<think>
<planning>
I need to calculate the profit margin after selling a robe made from specific spools. 
1. Calculate the total cost of one robe.
2. Determine the revenue from selling one robe.
3. Calculate the profit margin by subtracting the total cost from the revenue.
</planning>

<step num=1>Calculating the total cost of one robe</step>
The robe requires 2 red spools and 1 blue spool. Each spool costs $3. Therefore, the total cost can be calculated as follows:
Total cost = (2 red spools × $3) + (1 blue spool × $3).

<tool name="python_code">
code: "red
_spools_cost = 2 * 3
blue_sp
ool_cost = 1 * 3
total_cost =
 red_spools_cost + blue_spool_cost
total_cost
"
</tool>
<tool_response>
The total cost of one robe is $9.
</tool_response>

<step num=2>Determining the revenue from selling one robe</step>
The selling price of one robe is given as $40.

<step num=3>Calculating the profit margin</step>

Now I will calculate the profit margin by subtracting the total cost from the revenue:
  Profit margin = Revenue - Total cost.
<tool name="python_code">
code: "revenue
 = 40
profit_margin = revenue - total_cost

profit_margin"
</tool>
<tool_response>
The profit margin is $31.
</tool_response>

<step num=4>Providing the final answer</step>
The total cost of one robe is $9, the revenue from selling one robe is $40, and the profit margin is $31.
</think>

The total cost of one robe is $9, the revenue from selling one robe is $40, and the profit margin is $31.

Model Overview

Performance

(Thinker-1-NoTools refers to no actual tool calls, all hallucinated.)

	Qwen3-30B-A3B Thinking	Qwen3-4B Thinking	Qwen3-4B-Thinking-2507	Thinker-1-NoTools
Knowledge
GPQA	65.8	55.9	65.8	66.2
Reasoning
AIME25	70.9	65.6	81.3	81.9

$ For reproducibility, we report the win rates evaluated by GPT-4.1. (i did too)

(same as original weights)

Best Practices

To achieve optimal performance, we recommend the following settings:

Sampling Parameters:
- We suggest using Temperature=0.6, TopP=0.95, TopK=20, and MinP=0.
- For supported frameworks, you can adjust the presence_penalty parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
Adequate Output Length: We recommend using an output length of 32,768 tokens for most queries. For benchmarking on highly complex problems, such as those found in math and programming competitions, we suggest setting the max output length to 81,920 tokens. This provides the model with sufficient space to generate detailed and comprehensive responses, thereby enhancing its overall performance.
Standardize Output Format: We recommend using prompts to standardize model outputs when benchmarking.
- Math Problems: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
- Multiple-Choice Questions: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the answer field with only the choice letter, e.g., "answer": "C"."
No Thinking Content in History: In multi-turn conversations, the historical model output should only include the final output part and does not need to include the thinking content. It is implemented in the provided chat template in Jinja2. However, for frameworks that do not directly use the Jinja2 chat template, it is up to the developers to ensure that the best practice is followed.

@misc{qwen3technicalreport,
      title={Qwen3 Technical Report}, 
      author={Qwen Team},
      year={2025},
      eprint={2505.09388},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.09388}, 
}

aldigobbler
/

thinker-1