feat!: v2025-01-02, Thai reasoning traces, improved Thai and instruction following performance
Browse files- README.md +16 -5
- model-00001-of-00002.safetensors +1 -1
- model-00002-of-00002.safetensors +1 -1
- special_tokens_map.json +16 -0
- tokenizer_config.json +4 -0
README.md
CHANGED
@@ -18,6 +18,10 @@ datasets:
|
|
18 |
|
19 |
Typhoon T1 3B (Research Preview) is built on top of [Typhoon 2 3B Instruct](https://huggingface.co/scb10x/llama3.2-typhoon2-3b-instruct). It has improved performance on challenging benchmarks like GPQA, MMLU Pro, and AI Mathematics Olympiad validation set.
|
20 |
|
|
|
|
|
|
|
|
|
21 |
## Key Points
|
22 |
|
23 |
- **Typhoon T1** is a new family of open reasoning models developed by SCB 10X
|
@@ -25,6 +29,7 @@ Typhoon T1 3B (Research Preview) is built on top of [Typhoon 2 3B Instruct](http
|
|
25 |
- Typhoon T1 3B (Research Preview) offers a **fast**, **low-compute requirements** model, yet is **capable** in a variety of tasks by scaling test-time compute, enabling the model to think longer before giving a final answer. Typhoon T1 3B (Research Preview) is able to **_reason across domains_**, unlike many open reasoning models limited to mathematics and coding
|
26 |
- We **open** our recipe for data pipeline and training this model without distilling from other reasoning models
|
27 |
- We introduce **a new thinking paradigm** for reasoning models, structured thinking, where we add auxiliary tokens to help structure the thinking process of the model. This approach shows an increase in performance over a common variant of separating only thought and response parts based on our experiments
|
|
|
28 |
|
29 |
For more technical details, please visit our [technical blog](https://blog.opentyphoon.ai/introducing-typhoon-t1-a-family-of-open-reasoning-models-research-preview-22daacc88662).
|
30 |
|
@@ -36,20 +41,20 @@ For more technical details, please visit our [technical blog](https://blog.opent
|
|
36 |
|-----------------------|-------------------|-------------------------|----------------|----------|
|
37 |
| Typhoon 2 3B Instruct | 56.63 | 66 | 27.01 | 0 |
|
38 |
| Typhoon T1 3B (semi) | 59.59 | 68.99 | 25.89 | 0 |
|
39 |
-
| **Typhoon T1 3B (Research Preview)** | **62.40** | **69.87** | **31.7** | **2.22** |
|
40 |
|
41 |
### MMLU Pro (โ), 5-shot
|
42 |
|
43 |
| Model name | Average | Math | Health | Physics | Business | Biology | Chemistry | Computer Science | Economics | Engineering | Philosophy | Other | History | Psychology | Law |
|
44 |
|-----------------------|-----------|-----------|-----------|----------|-----------|-----------|-----------|------------------|-----------|-------------|------------|-----------|-----------|------------|-----------|
|
45 |
| Typhoon 2 3B Instruct | 26.7 | 26.8 | 33.62 | 23.4 | 25.35 | 43.38 | 19.88 | 28.29 | 35.43 | 18.37 | 28.06 | 27.92 | 25.72 | 37.84 | 13.17 |
|
46 |
-
| **Typhoon T1 3B (Research Preview)** | **30.65** | **30.57** | **36.19** | **27.1** | **31.69** | **50.77** | **22.17** | **31.22** | **38.86** | **21.98** | **30.66** | **32.79** | **26.51** | **43.36** | **17.26** |
|
47 |
|
48 |
## Model description
|
49 |
|
50 |
- **Model type**: A 3B instruct decoder-only model based on Llama architecture.
|
51 |
-
- **Requirement**: transformers 4.46.1 or newer.
|
52 |
-
- **Primary Language(s)**: English ๐ฌ๐ง and Thai ๐น๐ญ
|
53 |
- **License**: [Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE)
|
54 |
|
55 |
|
@@ -62,12 +67,15 @@ from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
62 |
import torch
|
63 |
|
64 |
model_id = "scb10x/llama-3.2-typhoon-t1-3b-research-preview"
|
|
|
|
|
65 |
|
66 |
-
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
67 |
model = AutoModelForCausalLM.from_pretrained(
|
68 |
model_id,
|
69 |
torch_dtype=torch.bfloat16,
|
70 |
device_map="auto",
|
|
|
71 |
)
|
72 |
|
73 |
messages = [
|
@@ -102,6 +110,9 @@ print(tokenizer.decode(response, skip_special_tokens=True))
|
|
102 |
```bash
|
103 |
pip install vllm
|
104 |
vllm serve scb10x/llama-3.2-typhoon-t1-3b-research-preview
|
|
|
|
|
|
|
105 |
# see more information at https://docs.vllm.ai/
|
106 |
```
|
107 |
|
|
|
18 |
|
19 |
Typhoon T1 3B (Research Preview) is built on top of [Typhoon 2 3B Instruct](https://huggingface.co/scb10x/llama3.2-typhoon2-3b-instruct). It has improved performance on challenging benchmarks like GPQA, MMLU Pro, and AI Mathematics Olympiad validation set.
|
20 |
|
21 |
+
> [!NOTE]
|
22 |
+
> ๐
|
23 |
+
> **2025-02-01 Update**: We have released a new version of Typhoon T1 3B (Research Preview) with the ability to ๐น๐ญ *generate Thai reasoning traces*, improved *Thai performance in general*, and *enhanced instruction following*. This version has the a comparative level of English performance to `v2025-01-23`.
|
24 |
+
|
25 |
## Key Points
|
26 |
|
27 |
- **Typhoon T1** is a new family of open reasoning models developed by SCB 10X
|
|
|
29 |
- Typhoon T1 3B (Research Preview) offers a **fast**, **low-compute requirements** model, yet is **capable** in a variety of tasks by scaling test-time compute, enabling the model to think longer before giving a final answer. Typhoon T1 3B (Research Preview) is able to **_reason across domains_**, unlike many open reasoning models limited to mathematics and coding
|
30 |
- We **open** our recipe for data pipeline and training this model without distilling from other reasoning models
|
31 |
- We introduce **a new thinking paradigm** for reasoning models, structured thinking, where we add auxiliary tokens to help structure the thinking process of the model. This approach shows an increase in performance over a common variant of separating only thought and response parts based on our experiments
|
32 |
+
- Typhoon T1 3B (Research Preview) `v2025-02-01` is the first reasoning model where we intentionally equipped the model with the ability to **generate Thai reasoning traces**, improving *transparency* and *interpretability* of the model.
|
33 |
|
34 |
For more technical details, please visit our [technical blog](https://blog.opentyphoon.ai/introducing-typhoon-t1-a-family-of-open-reasoning-models-research-preview-22daacc88662).
|
35 |
|
|
|
41 |
|-----------------------|-------------------|-------------------------|----------------|----------|
|
42 |
| Typhoon 2 3B Instruct | 56.63 | 66 | 27.01 | 0 |
|
43 |
| Typhoon T1 3B (semi) | 59.59 | 68.99 | 25.89 | 0 |
|
44 |
+
| **Typhoon T1 3B (Research Preview)** v2025-01-23 | **62.40** | **69.87** | **31.7** | **2.22** |
|
45 |
|
46 |
### MMLU Pro (โ), 5-shot
|
47 |
|
48 |
| Model name | Average | Math | Health | Physics | Business | Biology | Chemistry | Computer Science | Economics | Engineering | Philosophy | Other | History | Psychology | Law |
|
49 |
|-----------------------|-----------|-----------|-----------|----------|-----------|-----------|-----------|------------------|-----------|-------------|------------|-----------|-----------|------------|-----------|
|
50 |
| Typhoon 2 3B Instruct | 26.7 | 26.8 | 33.62 | 23.4 | 25.35 | 43.38 | 19.88 | 28.29 | 35.43 | 18.37 | 28.06 | 27.92 | 25.72 | 37.84 | 13.17 |
|
51 |
+
| **Typhoon T1 3B (Research Preview)** v2025-01-23 | **30.65** | **30.57** | **36.19** | **27.1** | **31.69** | **50.77** | **22.17** | **31.22** | **38.86** | **21.98** | **30.66** | **32.79** | **26.51** | **43.36** | **17.26** |
|
52 |
|
53 |
## Model description
|
54 |
|
55 |
- **Model type**: A 3B instruct decoder-only model based on Llama architecture.
|
56 |
+
- **Requirement**: `transformers` 4.46.1 or newer.
|
57 |
+
- **Primary Language(s)**: English ๐ฌ๐ง and Thai ๐น๐ญ
|
58 |
- **License**: [Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE)
|
59 |
|
60 |
|
|
|
67 |
import torch
|
68 |
|
69 |
model_id = "scb10x/llama-3.2-typhoon-t1-3b-research-preview"
|
70 |
+
revision = "main" # To use the previous version comment this line
|
71 |
+
# revision = "v2025-01-23" # To use the previous version uncomment this line
|
72 |
|
73 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)
|
74 |
model = AutoModelForCausalLM.from_pretrained(
|
75 |
model_id,
|
76 |
torch_dtype=torch.bfloat16,
|
77 |
device_map="auto",
|
78 |
+
revision=revision
|
79 |
)
|
80 |
|
81 |
messages = [
|
|
|
110 |
```bash
|
111 |
pip install vllm
|
112 |
vllm serve scb10x/llama-3.2-typhoon-t1-3b-research-preview
|
113 |
+
|
114 |
+
# To serve the previous version, add the revision parameter as shown below
|
115 |
+
# vllm serve scb10x/llama-3.2-typhoon-t1-3b-research-preview --revision v2025-01-23
|
116 |
# see more information at https://docs.vllm.ai/
|
117 |
```
|
118 |
|
model-00001-of-00002.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4965799096
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ff07f295caaeedc3964f764dfac7fe37adcc145888e4236ccdb4b74e5798d098
|
3 |
size 4965799096
|
model-00002-of-00002.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 1459729952
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1a012b37e95cafcde00213a6ffc094aea909e392539f8a09872d3aa2437ad973
|
3 |
size 1459729952
|
special_tokens_map.json
CHANGED
@@ -1,4 +1,20 @@
|
|
1 |
{
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
"bos_token": {
|
3 |
"content": "<|begin_of_text|>",
|
4 |
"lstrip": false,
|
|
|
1 |
{
|
2 |
+
"additional_special_tokens": [
|
3 |
+
{
|
4 |
+
"content": "<|eot_id|>",
|
5 |
+
"lstrip": false,
|
6 |
+
"normalized": false,
|
7 |
+
"rstrip": false,
|
8 |
+
"single_word": false
|
9 |
+
},
|
10 |
+
{
|
11 |
+
"content": "<|eom_id|>",
|
12 |
+
"lstrip": false,
|
13 |
+
"normalized": false,
|
14 |
+
"rstrip": false,
|
15 |
+
"single_word": false
|
16 |
+
}
|
17 |
+
],
|
18 |
"bos_token": {
|
19 |
"content": "<|begin_of_text|>",
|
20 |
"lstrip": false,
|
tokenizer_config.json
CHANGED
@@ -2049,6 +2049,10 @@
|
|
2049 |
"special": true
|
2050 |
}
|
2051 |
},
|
|
|
|
|
|
|
|
|
2052 |
"bos_token": "<|begin_of_text|>",
|
2053 |
"chat_template": "{{- bos_token }}\n{%- if custom_tools is defined %}\n {%- set tools = custom_tools %}\n{%- endif %}\n{%- if not date_string is defined %}\n {%- set date_string = \"26 Jul 2024\" %}\n{%- endif %}\n{%- if not tools is defined %}\n {%- set tools = none %}\n{%- endif %}\n\n{#- This block extracts the system message, so we can slot it into the right place. #}\n{%- if messages[0]['role'] == 'system' %}\n {%- set system_message = messages[0]['content']|trim %}\n {%- set messages = messages[1:] %}\n{%- else %}\n {%- set system_message = \"\" %}\n{%- endif %}\n\n{#- System message + builtin tools #}\n{{- \"<|start_header_id|>system<|end_header_id|>\\n\\n\" }}\n{{- \"Cutting Knowledge Date: December 2023\\n\" }}\n{{- \"Today Date: \" + date_string + \"\\n\\n\" }}\n{{- system_message }}\n{%- if tools is not none %}\n {{- \"\\n\" }}\n {{- \"You are given a question and a set of possible functions. Based on the question, you will need to make one or more function/tool calls to achieve the purpose.\" }}\n {{- \"If none of the function can be used, point it out. If the given question lacks the parameters required by the function, also point it out.\" }}\n {{- \"You should only return the function call in tools call sections.\" }}\n {{- \"If you decide to invoke any of the function(s), you MUST put it in the format of [Function(arguments1={{params_name1: params_value1,params_name2: params_value2, ...}}, name1=function_name1), Function(arguments2={{params}}, name2=function_name2) , ...]\"}}\n {{- \"You SHOULD NOT include any other text in the response.\\nHere is a list of functions in JSON format that you can invoke.\\n\" }}\n {%- for t in tools %}\n {{- t | tojson(indent=4) }}\n {{- \"\\n\\n\" }}\n {%- endfor %}\n{%- endif %}\n{{- \"<|eot_id|>\" }}\n\n\n{%- for message in messages %}\n {%- if not (message.role == 'tool') %}\n {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\\n\\n'+ message['content'] | trim + '<|eot_id|>' }}\n {%- elif message.role == \"tool\" %}\n {{- \"<|start_header_id|>tool<|end_header_id|>\\n\\n\" }}\n {%- if message.content is mapping or message.content is iterable %}\n {{- message.content | tojson }}\n {%- else %}\n {{- message.content }}\n {%- endif %}\n {{- \"<|eot_id|>\" }}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' }}\n{%- endif %}",
|
2054 |
"clean_up_tokenization_spaces": true,
|
|
|
2049 |
"special": true
|
2050 |
}
|
2051 |
},
|
2052 |
+
"additional_special_tokens": [
|
2053 |
+
"<|eot_id|>",
|
2054 |
+
"<|eom_id|>"
|
2055 |
+
],
|
2056 |
"bos_token": "<|begin_of_text|>",
|
2057 |
"chat_template": "{{- bos_token }}\n{%- if custom_tools is defined %}\n {%- set tools = custom_tools %}\n{%- endif %}\n{%- if not date_string is defined %}\n {%- set date_string = \"26 Jul 2024\" %}\n{%- endif %}\n{%- if not tools is defined %}\n {%- set tools = none %}\n{%- endif %}\n\n{#- This block extracts the system message, so we can slot it into the right place. #}\n{%- if messages[0]['role'] == 'system' %}\n {%- set system_message = messages[0]['content']|trim %}\n {%- set messages = messages[1:] %}\n{%- else %}\n {%- set system_message = \"\" %}\n{%- endif %}\n\n{#- System message + builtin tools #}\n{{- \"<|start_header_id|>system<|end_header_id|>\\n\\n\" }}\n{{- \"Cutting Knowledge Date: December 2023\\n\" }}\n{{- \"Today Date: \" + date_string + \"\\n\\n\" }}\n{{- system_message }}\n{%- if tools is not none %}\n {{- \"\\n\" }}\n {{- \"You are given a question and a set of possible functions. Based on the question, you will need to make one or more function/tool calls to achieve the purpose.\" }}\n {{- \"If none of the function can be used, point it out. If the given question lacks the parameters required by the function, also point it out.\" }}\n {{- \"You should only return the function call in tools call sections.\" }}\n {{- \"If you decide to invoke any of the function(s), you MUST put it in the format of [Function(arguments1={{params_name1: params_value1,params_name2: params_value2, ...}}, name1=function_name1), Function(arguments2={{params}}, name2=function_name2) , ...]\"}}\n {{- \"You SHOULD NOT include any other text in the response.\\nHere is a list of functions in JSON format that you can invoke.\\n\" }}\n {%- for t in tools %}\n {{- t | tojson(indent=4) }}\n {{- \"\\n\\n\" }}\n {%- endfor %}\n{%- endif %}\n{{- \"<|eot_id|>\" }}\n\n\n{%- for message in messages %}\n {%- if not (message.role == 'tool') %}\n {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\\n\\n'+ message['content'] | trim + '<|eot_id|>' }}\n {%- elif message.role == \"tool\" %}\n {{- \"<|start_header_id|>tool<|end_header_id|>\\n\\n\" }}\n {%- if message.content is mapping or message.content is iterable %}\n {{- message.content | tojson }}\n {%- else %}\n {{- message.content }}\n {%- endif %}\n {{- \"<|eot_id|>\" }}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' }}\n{%- endif %}",
|
2058 |
"clean_up_tokenization_spaces": true,
|