JunHowie commited on
Commit
12d36eb
·
verified ·
1 Parent(s): db0e842

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
.mdl ADDED
Binary file (67 Bytes). View file
 
.msc ADDED
Binary file (1.26 kB). View file
 
.mv ADDED
@@ -0,0 +1 @@
 
 
1
+ Revision:master,CreatedAt:1753855337
README.md ADDED
@@ -0,0 +1,277 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ license_link: https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507/blob/main/LICENSE
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - Qwen3
8
+ - GPTQ
9
+ - Int8
10
+ - 量化修复
11
+ - vLLM
12
+ base_model:
13
+ - Qwen/Qwen3-30B-A3B-Instruct-2507
14
+ base_model_relation: quantized
15
+ ---
16
+ # 通义千问3-30B-A3B-Instruct-2507-GPTQ-Int8
17
+ 基础型 [Qwen/Qwen3-30B-A3B-Instruct-2507](https://www.modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507)
18
+
19
+
20
+ ### 【Vllm 单机4卡启动命令】
21
+ <i>注: 4卡启动一定要跟`--enable-expert-parallel` 否则该模型专家张量TP整除除不尽;2卡则不需要。 </i>
22
+ ```
23
+ CONTEXT_LENGTH=32768 # 262144
24
+
25
+ vllm serve \
26
+ tclf90/Qwen3-30B-A3B-Instruct-2507-GPTQ-Int8 \
27
+ --served-model-name Qwen3-30B-A3B-Instruct-2507-GPTQ-Int8 \
28
+ --enable-expert-parallel \
29
+ --swap-space 16 \
30
+ --max-num-seqs 512 \
31
+ --max-model-len $CONTEXT_LENGTH \
32
+ --max-seq-len-to-capture $CONTEXT_LENGTH \
33
+ --gpu-memory-utilization 0.9 \
34
+ --tensor-parallel-size 4 \
35
+ --trust-remote-code \
36
+ --disable-log-requests \
37
+ --host 0.0.0.0 \
38
+ --port 8000
39
+ ```
40
+
41
+ ### 【依赖】
42
+
43
+ ```
44
+ vllm>=0.9.2
45
+ ```
46
+
47
+ ### 【模型更新日期】
48
+ ```
49
+ 2025-07-30
50
+ 1. 首次commit
51
+ ```
52
+
53
+ ### 【模型列表】
54
+
55
+ | 文件大小 | 最近更新时间 |
56
+ |--------|--------------|
57
+ | `30GB` | `2025-07-30` |
58
+
59
+
60
+
61
+ ### 【模型下载】
62
+
63
+ ```python
64
+ from modelscope import snapshot_download
65
+ snapshot_download('tclf90/Qwen3-30B-A3B-Instruct-2507-GPTQ-Int8', cache_dir="本地路径")
66
+ ```
67
+
68
+
69
+ ### 【介绍】
70
+ # Qwen3-30B-A3B-Instruct-2507
71
+ <a href="https://chat.qwen.ai/?model=Qwen3-30B-A3B-2507" target="_blank" style="margin: 2px;">
72
+ <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
73
+ </a>
74
+
75
+ ## Highlights
76
+
77
+ We introduce the updated version of the **Qwen3-30B-A3B non-thinking mode**, named **Qwen3-30B-A3B-Instruct-2507**, featuring the following key enhancements:
78
+
79
+ - **Significant improvements** in general capabilities, including **instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage**.
80
+ - **Substantial gains** in long-tail knowledge coverage across **multiple languages**.
81
+ - **Markedly better alignment** with user preferences in **subjective and open-ended tasks**, enabling more helpful responses and higher-quality text generation.
82
+ - **Enhanced capabilities** in **256K long-context understanding**.
83
+
84
+ ![image/jpeg](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-2507/Qwen3-30B-A3B-Instruct-2507.jpeg)
85
+
86
+ ## Model Overview
87
+
88
+ **Qwen3-30B-A3B-Instruct-2507** has the following features:
89
+ - Type: Causal Language Models
90
+ - Training Stage: Pretraining & Post-training
91
+ - Number of Parameters: 30.5B in total and 3.3B activated
92
+ - Number of Paramaters (Non-Embedding): 29.9B
93
+ - Number of Layers: 48
94
+ - Number of Attention Heads (GQA): 32 for Q and 4 for KV
95
+ - Number of Experts: 128
96
+ - Number of Activated Experts: 8
97
+ - Context Length: **262,144 natively**.
98
+
99
+ **NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.**
100
+
101
+ For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
102
+
103
+
104
+ ## Performance
105
+
106
+ | | Deepseek-V3-0324 | GPT-4o-0327 | Gemini-2.5-Flash Non-Thinking | Qwen3-235B-A22B Non-Thinking | Qwen3-30B-A3B Non-Thinking | Qwen3-30B-A3B-Instruct-2507 |
107
+ |--- | --- | --- | --- | --- | --- | --- |
108
+ | **Knowledge** | | | | | | |
109
+ | MMLU-Pro | **81.2** | 79.8 | 81.1 | 75.2 | 69.1 | 78.4 |
110
+ | MMLU-Redux | 90.4 | **91.3** | 90.6 | 89.2 | 84.1 | 89.3 |
111
+ | GPQA | 68.4 | 66.9 | **78.3** | 62.9 | 54.8 | 70.4 |
112
+ | SuperGPQA | **57.3** | 51.0 | 54.6 | 48.2 | 42.2 | 53.4 |
113
+ | **Reasoning** | | | | | | |
114
+ | AIME25 | 46.6 | 26.7 | **61.6** | 24.7 | 21.6 | 61.3 |
115
+ | HMMT25 | 27.5 | 7.9 | **45.8** | 10.0 | 12.0 | 43.0 |
116
+ | ZebraLogic | 83.4 | 52.6 | 57.9 | 37.7 | 33.2 | **90.0** |
117
+ | LiveBench 20241125 | 66.9 | 63.7 | **69.1** | 62.5 | 59.4 | 69.0 |
118
+ | **Coding** | | | | | | |
119
+ | LiveCodeBench v6 (25.02-25.05) | **45.2** | 35.8 | 40.1 | 32.9 | 29.0 | 43.2 |
120
+ | MultiPL-E | 82.2 | 82.7 | 77.7 | 79.3 | 74.6 | **83.8** |
121
+ | Aider-Polyglot | 55.1 | 45.3 | 44.0 | **59.6** | 24.4 | 35.6 |
122
+ | **Alignment** | | | | | | |
123
+ | IFEval | 82.3 | 83.9 | 84.3 | 83.2 | 83.7 | **84.7** |
124
+ | Arena-Hard v2* | 45.6 | 61.9 | 58.3 | 52.0 | 24.8 | **69.0** |
125
+ | Creative Writing v3 | 81.6 | 84.9 | 84.6 | 80.4 | 68.1 | **86.0** |
126
+ | WritingBench | 74.5 | 75.5 | 80.5 | 77.0 | 72.2 | **85.5** |
127
+ | **Agent** | | | | | | |
128
+ | BFCL-v3 | 64.7 | 66.5 | 66.1 | **68.0** | 58.6 | 65.1 |
129
+ | TAU1-Retail | 49.6 | 60.3# | **65.2** | 65.2 | 38.3 | 59.1 |
130
+ | TAU1-Airline | 32.0 | 42.8# | **48.0** | 32.0 | 18.0 | 40.0 |
131
+ | TAU2-Retail | **71.1** | 66.7# | 64.3 | 64.9 | 31.6 | 57.0 |
132
+ | TAU2-Airline | 36.0 | 42.0# | **42.5** | 36.0 | 18.0 | 38.0 |
133
+ | TAU2-Telecom | **34.0** | 29.8# | 16.9 | 24.6 | 18.4 | 12.3 |
134
+ | **Multilingualism** | | | | | | |
135
+ | MultiIF | 66.5 | 70.4 | 69.4 | 70.2 | **70.8** | 67.9 |
136
+ | MMLU-ProX | 75.8 | 76.2 | **78.3** | 73.2 | 65.1 | 72.0 |
137
+ | INCLUDE | 80.1 | 82.1 | **83.8** | 75.6 | 67.8 | 71.9 |
138
+ | PolyMATH | 32.2 | 25.5 | 41.9 | 27.0 | 23.3 | **43.1** |
139
+
140
+ *: For reproducibility, we report the win rates evaluated by GPT-4.1.
141
+
142
+ \#: Results were generated using GPT-4o-20241120, as access to the native function calling API of GPT-4o-0327 was unavailable.
143
+
144
+
145
+ ## Quickstart
146
+
147
+ The code of Qwen3-MoE has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.
148
+
149
+ With `transformers<4.51.0`, you will encounter the following error:
150
+ ```
151
+ KeyError: 'qwen3_moe'
152
+ ```
153
+
154
+ The following contains a code snippet illustrating how to use the model generate content based on given inputs.
155
+ ```python
156
+ from transformers import AutoModelForCausalLM, AutoTokenizer
157
+
158
+ model_name = "Qwen/Qwen3-30B-A3B-Instruct-2507"
159
+
160
+ # load the tokenizer and the model
161
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
162
+ model = AutoModelForCausalLM.from_pretrained(
163
+ model_name,
164
+ torch_dtype="auto",
165
+ device_map="auto"
166
+ )
167
+
168
+ # prepare the model input
169
+ prompt = "Give me a short introduction to large language model."
170
+ messages = [
171
+ {"role": "user", "content": prompt}
172
+ ]
173
+ text = tokenizer.apply_chat_template(
174
+ messages,
175
+ tokenize=False,
176
+ add_generation_prompt=True,
177
+ )
178
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
179
+
180
+ # conduct text completion
181
+ generated_ids = model.generate(
182
+ **model_inputs,
183
+ max_new_tokens=16384
184
+ )
185
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
186
+
187
+ content = tokenizer.decode(output_ids, skip_special_tokens=True)
188
+
189
+ print("content:", content)
190
+ ```
191
+
192
+ For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
193
+ - SGLang:
194
+ ```shell
195
+ python -m sglang.launch_server --model-path Qwen/Qwen3-30B-A3B-Instruct-2507 --context-length 262144
196
+ ```
197
+ - vLLM:
198
+ ```shell
199
+ vllm serve Qwen/Qwen3-30B-A3B-Instruct-2507 --max-model-len 262144
200
+ ```
201
+
202
+ **Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
203
+
204
+ For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
205
+
206
+ ## Agentic Use
207
+
208
+ Qwen3 excels in tool calling capabilities. We recommend using [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) to make the best use of agentic ability of Qwen3. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity.
209
+
210
+ To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.
211
+ ```python
212
+ from qwen_agent.agents import Assistant
213
+
214
+ # Define LLM
215
+ llm_cfg = {
216
+ 'model': 'Qwen3-30B-A3B-Instruct-2507',
217
+
218
+ # Use a custom endpoint compatible with OpenAI API:
219
+ 'model_server': 'http://localhost:8000/v1', # api_base
220
+ 'api_key': 'EMPTY',
221
+ }
222
+
223
+ # Define Tools
224
+ tools = [
225
+ {'mcpServers': { # You can specify the MCP configuration file
226
+ 'time': {
227
+ 'command': 'uvx',
228
+ 'args': ['mcp-server-time', '--local-timezone=Asia/Shanghai']
229
+ },
230
+ "fetch": {
231
+ "command": "uvx",
232
+ "args": ["mcp-server-fetch"]
233
+ }
234
+ }
235
+ },
236
+ 'code_interpreter', # Built-in tools
237
+ ]
238
+
239
+ # Define Agent
240
+ bot = Assistant(llm=llm_cfg, function_list=tools)
241
+
242
+ # Streaming generation
243
+ messages = [{'role': 'user', 'content': 'https://qwenlm.github.io/blog/ Introduce the latest developments of Qwen'}]
244
+ for responses in bot.run(messages=messages):
245
+ pass
246
+ print(responses)
247
+ ```
248
+
249
+ ## Best Practices
250
+
251
+ To achieve optimal performance, we recommend the following settings:
252
+
253
+ 1. **Sampling Parameters**:
254
+ - We suggest using `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`.
255
+ - For supported frameworks, you can adjust the `presence_penalty` parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
256
+
257
+ 2. **Adequate Output Length**: We recommend using an output length of 16,384 tokens for most queries, which is adequate for instruct models.
258
+
259
+ 3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
260
+ - **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
261
+ - **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the `answer` field with only the choice letter, e.g., `"answer": "C"`."
262
+
263
+ ### Citation
264
+
265
+ If you find our work helpful, feel free to give us a cite.
266
+
267
+ ```
268
+ @misc{qwen3technicalreport,
269
+ title={Qwen3 Technical Report},
270
+ author={Qwen Team},
271
+ year={2025},
272
+ eprint={2505.09388},
273
+ archivePrefix={arXiv},
274
+ primaryClass={cs.CL},
275
+ url={https://arxiv.org/abs/2505.09388},
276
+ }
277
+ ```
config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name_or_path": "tclf90/Qwen3-30B-A3B-Instruct-2507-GPTQ-Int8",
3
+ "architectures": [
4
+ "Qwen3MoeForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 151643,
9
+ "decoder_sparse_step": 1,
10
+ "eos_token_id": 151645,
11
+ "head_dim": 128,
12
+ "hidden_act": "silu",
13
+ "hidden_size": 2048,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 6144,
16
+ "max_position_embeddings": 262144,
17
+ "max_window_layers": 48,
18
+ "mlp_only_layers": [],
19
+ "model_type": "qwen3_moe",
20
+ "moe_intermediate_size": 768,
21
+ "norm_topk_prob": true,
22
+ "num_attention_heads": 32,
23
+ "num_experts": 128,
24
+ "num_experts_per_tok": 8,
25
+ "num_hidden_layers": 48,
26
+ "num_key_value_heads": 4,
27
+ "output_router_logits": false,
28
+ "rms_norm_eps": 1e-06,
29
+ "rope_scaling": null,
30
+ "rope_theta": 10000000,
31
+ "router_aux_loss_coef": 0.001,
32
+ "sliding_window": null,
33
+ "tie_word_embeddings": false,
34
+ "torch_dtype": "float16",
35
+ "transformers_version": "4.51.0",
36
+ "use_cache": true,
37
+ "use_sliding_window": false,
38
+ "vocab_size": 151936,
39
+ "quantization_config": {
40
+ "quant_method": "gptq",
41
+ "bits": 8,
42
+ "group_size": 128,
43
+ "sym": true,
44
+ "desc_act": false
45
+ }
46
+ }
configuration.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"framework":"Pytorch","task":"text-generation"}
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "temperature": 0.7,
10
+ "top_k": 20,
11
+ "top_p": 0.8,
12
+ "transformers_version": "4.51.0"
13
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9589b453f072aaf8d651a960112fe4100497ea5d4b27c9e62475efee95970ef2
3
+ size 5000063336
model-00002-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b235ebde0d51ce29d9ae7f6c91717261845373a066c716103afdea99fa5c89eb
3
+ size 5001147568
model-00003-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3fe72c1a752082a4d6faa5984ff83e86d72da794835dd86202a9ab84055a6a8f
3
+ size 5001152536
model-00004-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:85436e1a3b4654eb19a1c22959498fb2f78554eec9615d527687f914ad675276
3
+ size 5001147400
model-00005-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66addfa706c398d0f4b30102a53fc4f8861e3642da3fc17fa8397380bbde216f
3
+ size 5001152176
model-00006-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5de0c85b3757eebeeacb011b480523f9c2002db83cd230ef3f97e2e456a11d3a
3
+ size 5000659728
model-00007-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbdc7238b3af626e91a390a8b52184676c18d76705799e6564f240fa8aabb3f6
3
+ size 1993140504
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
3
+ size 11422654
tokenizer_config.json ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "151643": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "151644": {
13
+ "content": "<|im_start|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "151645": {
21
+ "content": "<|im_end|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "151646": {
29
+ "content": "<|object_ref_start|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "151647": {
37
+ "content": "<|object_ref_end|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "151648": {
45
+ "content": "<|box_start|>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "151649": {
53
+ "content": "<|box_end|>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "151650": {
61
+ "content": "<|quad_start|>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "151651": {
69
+ "content": "<|quad_end|>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "151652": {
77
+ "content": "<|vision_start|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "151653": {
85
+ "content": "<|vision_end|>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "151654": {
93
+ "content": "<|vision_pad|>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "151655": {
101
+ "content": "<|image_pad|>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "151656": {
109
+ "content": "<|video_pad|>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "151657": {
117
+ "content": "<tool_call>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": false
123
+ },
124
+ "151658": {
125
+ "content": "</tool_call>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": false
131
+ },
132
+ "151659": {
133
+ "content": "<|fim_prefix|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": false
139
+ },
140
+ "151660": {
141
+ "content": "<|fim_middle|>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": false
147
+ },
148
+ "151661": {
149
+ "content": "<|fim_suffix|>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": false
155
+ },
156
+ "151662": {
157
+ "content": "<|fim_pad|>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": false
163
+ },
164
+ "151663": {
165
+ "content": "<|repo_name|>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": false
171
+ },
172
+ "151664": {
173
+ "content": "<|file_sep|>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": false
179
+ },
180
+ "151665": {
181
+ "content": "<tool_response>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": false
187
+ },
188
+ "151666": {
189
+ "content": "</tool_response>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": false
195
+ },
196
+ "151667": {
197
+ "content": "<think>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": false
203
+ },
204
+ "151668": {
205
+ "content": "</think>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": false
211
+ }
212
+ },
213
+ "additional_special_tokens": [
214
+ "<|im_start|>",
215
+ "<|im_end|>",
216
+ "<|object_ref_start|>",
217
+ "<|object_ref_end|>",
218
+ "<|box_start|>",
219
+ "<|box_end|>",
220
+ "<|quad_start|>",
221
+ "<|quad_end|>",
222
+ "<|vision_start|>",
223
+ "<|vision_end|>",
224
+ "<|vision_pad|>",
225
+ "<|image_pad|>",
226
+ "<|video_pad|>"
227
+ ],
228
+ "bos_token": null,
229
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in content %}\n {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}",
230
+ "clean_up_tokenization_spaces": false,
231
+ "eos_token": "<|im_end|>",
232
+ "errors": "replace",
233
+ "model_max_length": 262144,
234
+ "pad_token": "<|endoftext|>",
235
+ "split_special_tokens": false,
236
+ "tokenizer_class": "Qwen2Tokenizer",
237
+ "unk_token": null,
238
+ "add_bos_token": false
239
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff