Text Generation
GGUF
English
Qwen2.5-Coder-1.5B
Inference Endpoints
conversational
Melvin56 commited on
Commit
1966cfb
·
verified ·
1 Parent(s): 11742b9

Upload model via Google Colab

Browse files
.gitattributes CHANGED
@@ -33,3 +33,13 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ hammer2.1-1.5b-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
37
+ hammer2.1-1.5b-Q2_K_S.gguf filter=lfs diff=lfs merge=lfs -text
38
+ hammer2.1-1.5b-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
39
+ hammer2.1-1.5b-Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
40
+ hammer2.1-1.5b-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
41
+ hammer2.1-1.5b-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
42
+ hammer2.1-1.5b-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
43
+ hammer2.1-1.5b-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
44
+ hammer2.1-1.5b-f16.gguf filter=lfs diff=lfs merge=lfs -text
45
+ imatrix.dat filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,238 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ datasets:
4
+ - Salesforce/xlam-function-calling-60k
5
+ - MadeAgents/xlam-irrelevance-7.5k
6
+ base_model:
7
+ - Qwen/Qwen2.5-Coder-1.5B-Instruct
8
+ ---
9
+ # Hammer2.1-1.5b Function Calling Model
10
+
11
+ ## Introduction
12
+
13
+ Hammer refers to a series of lightweight Large Action Models. Currently, we are releasing Hammer 2.1 models ([0.5B](https://huggingface.co/MadeAgents/Hammer2.1-0.5b), [1.5B](https://huggingface.co/MadeAgents/Hammer2.1-1.5b), [3B](https://huggingface.co/MadeAgents/Hammer2.1-3b), and [7B](https://huggingface.co/MadeAgents/Hammer2.1-7b)) with strong function calling capability. These models are based on the Qwen 2.5 coder series and utilize [function masking techniques](https://arxiv.org/abs/2410.04587) and other advanced technologies. Hammer 2.1 series bring significant enhancements, while still maintaining the basic functionality of Hammer 2.0's Single-Turn interaction and further strengthening other capabilities.
14
+
15
+ ## Model Details
16
+ The Hammer 2.1 models, fine-tuned from the Qwen 2.5 coder series, inherit Hammer 2.0's advantages and are enhanced as follows:
17
+ - <span style="color: red;">Multi-Step Function Calling:</span> The assistant can perform multiple internal function calls to handle a single user request, actively planning and gathering information to fulfill complex tasks.
18
+ - <span style="color: red;">Multi-Turn Function Calling:</span> Enables continuous and context-aware interactions over multiple exchanges, with each turn potentially containing multiple steps, for a more natural conversation experience.
19
+ - Enhanced Irrelevant Information Inspection: Better at identifying when provided functions are irrelevant to a user query, by providing a non-function call response.
20
+
21
+ ## Evaluation
22
+ The evaluation results of Hammer 2.1 models on the Berkeley Function-Calling Leaderboard (BFCL-v3) are presented in the following table:
23
+ <div style="text-align: center;">
24
+ <img src="v2_figures/bfcl.png" alt="overview" width="1000" style="margin: auto;">
25
+ </div>
26
+
27
+ Our Hammer 2.1 series consistently achieves corresponding best performance at comparable scales. The 7B/3B/1.5B model outperform most function calling enchanced models.
28
+
29
+ In addition, we evaluated the Hammer 2.1 models on other academic benchmarks to further demonstrate the generalization ability of our models.
30
+
31
+ <div style="text-align: center;">
32
+ <img src="v2_figures/others-v2.png" alt="overview" width="1000" style="margin: auto;">
33
+ </div>
34
+
35
+ Hammer 2.1 models showcase highly stable performance, suggesting the robustness of Hammer 2.1 series. In contrast, the baseline approaches display varying levels of effectiveness.
36
+
37
+
38
+
39
+
40
+ ## Requiements
41
+ The code of Hammer 2.1 models have been in the latest Hugging face transformers and we advise you to install `transformers>=4.47.0`.
42
+
43
+ ## How to Use
44
+ Hammer models offer flexibility in deployment and usage, fully supporting both **vLLM** deployment and **Hugging Face Transformers** tool calling. Below are the specifics on how to make use of these features:
45
+
46
+ ### Using vLLM
47
+ #### Option 1: Using Hammer client (Recommended)
48
+
49
+ Before using vLLM, first clone the Hammer code repository and change directory to the 'Hammer':
50
+ ```
51
+ git clone https://github.com/MadeAgents/Hammer.git
52
+ cd Hammer
53
+ ```
54
+
55
+ vLLM offers efficient serving with lower latency. To serve the model with vLLM:
56
+ ```
57
+ vllm serve MadeAgents/Hammer2.1-1.5b --host 0.0.0.0 --port 8000 --tensor-parallel-size 1
58
+ ```
59
+ Once the model is served, you can use the following Hammer client to interact with it for function calling:
60
+ ~~~
61
+ from client import HammerChatCompletion,HammerConfig
62
+ config = HammerConfig(base_url="http://localhost:8000/v1/", model="MadeAgents/Hammer2.1-1.5b")
63
+ llm = HammerChatCompletion.from_config(config)
64
+
65
+ # Example conversation
66
+ messages = [
67
+ {"role": "user", "content": "What's the weather like in New York?"},
68
+ {"role": "assistant","content": '```\n{"name": "get_weather", "arguments": {"location": "New York, NY ", "unit": "celsius"}\n```'},
69
+ {"role": "tool", "name": "get_weather", "content": '{"temperature": 72, "description": "Partly cloudy"}'},
70
+ {"role": "user", "content": "Now, search for the weather in San Francisco."}
71
+ ]
72
+
73
+ # Example function definition (optional)
74
+ tools = [
75
+ {
76
+ "name": "get_weather",
77
+ "description": "Get the current weather for a location",
78
+ "parameters": {
79
+ "type": "object",
80
+ "properties": {
81
+ "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
82
+ "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
83
+ },
84
+ "required": ["location"]
85
+ }
86
+ },
87
+ {
88
+ "name": "respond",
89
+ "description": "When you are ready to respond, use this function. This function allows the assistant to formulate and deliver appropriate replies based on the input message and the context of the conversation. Generate a concise response for simple questions, and a more detailed response for complex questions.",
90
+ "parameters": {
91
+ "type": "object",
92
+ "properties": {
93
+ "message": {"type": "string", "description": "The content of the message to respond to."}
94
+ },
95
+ "required": ["message"]
96
+ }
97
+ }
98
+ ]
99
+
100
+ response = llm.completion(messages, tools=tools)
101
+ print(response)
102
+ ~~~
103
+
104
+
105
+ #### Option 2: Using vLLM’s built-in tool calling
106
+ Hammer2.1 supports vllm’s built-in tool calling. This functionality requires vllm>=0.6. If you want to enable this functionality, please start vllm’s OpenAI-compatible service with:
107
+ ~~~
108
+ vllm serve MadeAgents/Hammer2.1-1.5b --enable-auto-tool-choice --tool-call-parser hermes
109
+ ~~~
110
+ And then use it in the same way you use GPT’s tool calling:
111
+ ~~~
112
+ tools = [
113
+ {
114
+ "type": "function",
115
+ "function": {
116
+ "name": "get_current_weather",
117
+ "description": "Get the current weather",
118
+ "parameters": {
119
+ "type": "object",
120
+ "properties": {
121
+ "location": {
122
+ "type": "string",
123
+ "description": "The city and state, e.g. San Francisco, CA",
124
+ },
125
+ "format": {
126
+ "type": "string",
127
+ "enum": ["celsius", "fahrenheit"],
128
+ "description": "The temperature unit to use. Infer this from the users location.",
129
+ "default": "celsius"
130
+ },
131
+ },
132
+ "required": ["location","format"],
133
+ },
134
+ }
135
+ },
136
+ {
137
+ "type": "function",
138
+ "function": {
139
+ "name": "get_n_day_weather_forecast",
140
+ "description": "Get an N-day weather forecast",
141
+ "parameters": {
142
+ "type": "object",
143
+ "properties": {
144
+ "location": {
145
+ "type": "string",
146
+ "description": "The city and state, e.g. San Francisco, CA",
147
+ },
148
+ "format": {
149
+ "type": "string",
150
+ "enum": ["celsius", "fahrenheit"],
151
+ "description": "The temperature unit to use. Infer this from the users location.",
152
+ "default": "celsius"
153
+ },
154
+ "num_days": {
155
+ "type": "integer",
156
+ "description": "The number of days to forecast",
157
+ "default": 1
158
+ }
159
+ },
160
+ "required": ["location", "format", "num_days"]
161
+ },
162
+ }
163
+ },
164
+ ]
165
+
166
+
167
+ from openai import OpenAI
168
+ openai_api_key = "None"
169
+ openai_api_base = "http://localhost:8000/v1"
170
+
171
+ client = OpenAI(
172
+ api_key=openai_api_key,
173
+ base_url=openai_api_base,
174
+ )
175
+
176
+ query = """What's the weather like today in San Francisco"""
177
+
178
+ chat_response = client.chat.completions.create(
179
+ model="MadeAgents/Hammer2.1-1.5b",
180
+ messages=[
181
+ {"role": "user", "content": query},],
182
+ tools = tools,
183
+ temperature=0
184
+ )
185
+ print(chat_response.choices[0].message.content)
186
+ ~~~
187
+
188
+
189
+ ### Using Hugging Face Transformers
190
+ Hammer2.1’s chat template also includes a tool calling template, meaning that you can use Hugging Face transformers’ tool calling support. This is a simple example of how to use our model using Transformers.
191
+ ~~~
192
+ import torch
193
+ from transformers import AutoModelForCausalLM, AutoTokenizer
194
+
195
+
196
+ tokenizer = AutoTokenizer.from_pretrained("MadeAgents/Hammer2.1-1.5b")
197
+ model = AutoModelForCausalLM.from_pretrained("MadeAgents/Hammer2.1-1.5b", torch_dtype=torch.bfloat16, device_map="auto")
198
+
199
+ # Example conversation
200
+ messages = [
201
+ {"role": "user", "content": "What's the weather like in New York?"},
202
+ {"role": "assistant","content": '```\n{"name": "get_weather", "arguments": {"location": "New York, NY ", "unit": "celsius"}\n```'},
203
+ {"role": "tool", "name": "get_weather", "content": '{"temperature": 72, "description": "Partly cloudy"}'},
204
+ {"role": "user", "content": "Now, search for the weather in San Francisco."}
205
+ ]
206
+
207
+ # Example function definition (optional)
208
+ tools = [
209
+ {
210
+ "name": "get_weather",
211
+ "description": "Get the current weather for a location",
212
+ "parameters": {
213
+ "type": "object",
214
+ "properties": {
215
+ "location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"},
216
+ "unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
217
+ },
218
+ "required": ["location"]
219
+ }
220
+ },
221
+ {
222
+ "name": "respond",
223
+ "description": "When you are ready to respond, use this function. This function allows the assistant to formulate and deliver appropriate replies based on the input message and the context of the conversation. Generate a concise response for simple questions, and a more detailed response for complex questions.",
224
+ "parameters": {
225
+ "type": "object",
226
+ "properties": {
227
+ "message": {"type": "string", "description": "The content of the message to respond to."}
228
+ },
229
+ "required": ["message"]
230
+ }
231
+ }
232
+ ]
233
+
234
+ inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
235
+ inputs = {k: v.to(model.device) for k, v in inputs.items()}
236
+ out = model.generate(**inputs, max_new_tokens=128)
237
+ print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):], skip_special_tokens=True))
238
+ ~~~
hammer2.1-1.5b-Q2_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:acdee1de9bcfd3ccb83ac835bea1fbb3aff4f84375671788c8a9affb6161e1a0
3
+ size 675957696
hammer2.1-1.5b-Q2_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:302a9553464611ccce596b7cb2a36249bc6829686327d0a3d71ab1ca49c306db
3
+ size 639787968
hammer2.1-1.5b-Q3_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1bfda68423c6c731420942309b51fae8c922d72d1c99eb11d45436be621b5dd7
3
+ size 823831488
hammer2.1-1.5b-Q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47ee8d274dfea0c642c3ac9b637dffed6190abe3696a613303269b0343a00c11
3
+ size 937188288
hammer2.1-1.5b-Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c3447bf9d0dbbedcff33f4f64769f3add82877790a23940d554b5099056a0110
3
+ size 985701312
hammer2.1-1.5b-Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4507c4e40a6ac081e51f1930e4eede27efabfdc5f3152d7ca3f48ec9e0cad1c
3
+ size 1124703168
hammer2.1-1.5b-Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8426161e3ceb21e56aa9714a9ede19297db9d1c358771781c4892f1d1df17bf9
3
+ size 1272392640
hammer2.1-1.5b-Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9537ffd70ef8024e49e8dbf8bf7cc1579e6b2102ad279a5728638112dad85d1c
3
+ size 1646125024
hammer2.1-1.5b-f16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:badb2cf2e8e20e742c7972efd412cd331f8d0b8ea2e0f473c633a823998251e8
3
+ size 3092830848
imatrix.dat ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5824be803ac5ac5ae23cb5592bb3034048045e4c99a99aee07a68a15478c9eb5
3
+ size 2042233