weathermanj commited on
Commit
577a31e
·
verified ·
1 Parent(s): 9575892

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +119 -53
README.md CHANGED
@@ -1,98 +1,123 @@
1
  ---
2
- language:
3
- - en
4
  license: other
5
- base_model: Qwen/Qwen2.5-3B-Instruct
6
  tags:
7
  - qwen
8
  - grpo
9
- - reinforcement-learning
10
- - instruction-tuning
11
- - mathematical-reasoning
12
- - gsm8k
 
 
 
 
13
  datasets:
14
  - gsm8k
15
  model-index:
16
  - name: Menda-3B-250
17
  results:
18
  - task:
19
- type: multiple-choice-qa
 
 
 
20
  name: ARC-Challenge
21
  metrics:
22
  - name: Accuracy
23
  type: accuracy
24
  value: 50.0
25
  - task:
26
- type: multiple-choice-qa
 
 
 
27
  name: BoolQ
28
  metrics:
29
  - name: Accuracy
30
  type: accuracy
31
  value: 80.0
32
  - task:
33
- type: multiple-choice-qa
 
 
 
34
  name: HellaSwag
35
  metrics:
36
  - name: Accuracy
37
  type: accuracy
38
  value: 40.0
39
  - task:
40
- type: multiple-choice-qa
41
- name: Lambada
42
- metrics:
43
- - name: Accuracy
44
- type: accuracy
45
- value: 70.0
46
- - task:
47
- type: multiple-choice-qa
48
- name: PIQA
49
- metrics:
50
- - name: Accuracy
51
- type: accuracy
52
- value: 90.0
53
- - task:
54
- type: multiple-choice-qa
55
- name: Winogrande
56
- metrics:
57
- - name: Accuracy
58
- type: accuracy
59
- value: 90.0
60
- - task:
61
  type: mmlu
62
- name: MMLU
63
  metrics:
64
- - name: Average
65
  type: accuracy
66
  value: 68.95
67
  ---
68
 
69
- # Menda-3B-250
70
 
71
- Menda-3B-250 is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) using Guided Reinforcement from Preference Optimization (GRPO). This model represents the 250-step checkpoint from the training process.
72
 
73
  ## Model Details
74
 
75
  - **Base Model**: Qwen/Qwen2.5-3B-Instruct
76
  - **Training Method**: GRPO (Guided Reinforcement from Preference Optimization)
77
  - **Training Steps**: 250
78
- - **Parameters**: 3B
79
  - **Context Length**: 32K tokens
80
  - **Training Data**: GSM8K (mathematical reasoning)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
- ## Performance
 
 
 
 
 
83
 
84
- Based on extensive evaluation, the 250-step checkpoint shows surprisingly strong performance across multiple benchmarks:
85
 
86
- ### Core Benchmarks (0-shot)
87
 
88
- | Benchmark | Score |
89
- |-----------|-------|
90
- | ARC-Challenge | 50.0% |
91
- | BoolQ | 80.0% |
92
- | HellaSwag | 40.0% |
93
- | Lambada | 70.0% |
94
- | PIQA | 90.0% |
95
- | Winogrande | 90.0% |
96
 
97
  ### MMLU Performance
98
 
@@ -111,24 +136,44 @@ Based on extensive evaluation, the 250-step checkpoint shows surprisingly strong
111
  - **Efficient Training**: Achieves impressive results with minimal training (only 250 steps).
112
  - **Balanced Capabilities**: Maintains strong performance across diverse tasks without significant trade-offs.
113
 
114
- ## Usage
 
 
115
 
116
  ```python
117
  from transformers import AutoModelForCausalLM, AutoTokenizer
118
 
119
  model_name = "weathermanj/Menda-3B-250"
120
-
121
  model = AutoModelForCausalLM.from_pretrained(
122
  model_name,
123
  torch_dtype="auto",
124
  device_map="auto"
125
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
126
  tokenizer = AutoTokenizer.from_pretrained(model_name)
 
 
 
 
 
127
 
128
- prompt = "Give me a short introduction to large language models."
129
  messages = [
130
- {"role": "system", "content": "You are a helpful assistant."},
131
- {"role": "user", "content": prompt}
132
  ]
133
  text = tokenizer.apply_chat_template(
134
  messages,
@@ -149,6 +194,27 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
149
  print(response)
150
  ```
151
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152
  ## Training Configuration
153
 
154
  The model was trained using the GRPO methodology with the following configuration:
@@ -162,4 +228,4 @@ The model was trained using the GRPO methodology with the following configuratio
162
 
163
  ## License
164
 
165
- This model is subject to the license of the original Qwen2.5-3B-Instruct model.
 
1
  ---
2
+ language: en
 
3
  license: other
 
4
  tags:
5
  - qwen
6
  - grpo
7
+ - instruct
8
+ - fine-tuned
9
+ - reasoning
10
+ - 3b
11
+ - menda
12
+ - chat
13
+ - transformers
14
+ library_name: transformers
15
  datasets:
16
  - gsm8k
17
  model-index:
18
  - name: Menda-3B-250
19
  results:
20
  - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ type: arc-challenge
25
  name: ARC-Challenge
26
  metrics:
27
  - name: Accuracy
28
  type: accuracy
29
  value: 50.0
30
  - task:
31
+ type: text-generation
32
+ name: Text Generation
33
+ dataset:
34
+ type: boolq
35
  name: BoolQ
36
  metrics:
37
  - name: Accuracy
38
  type: accuracy
39
  value: 80.0
40
  - task:
41
+ type: text-generation
42
+ name: Text Generation
43
+ dataset:
44
+ type: hellaswag
45
  name: HellaSwag
46
  metrics:
47
  - name: Accuracy
48
  type: accuracy
49
  value: 40.0
50
  - task:
51
+ type: text-generation
52
+ name: Text Generation
53
+ dataset:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  type: mmlu
55
+ name: MMLU (Overall)
56
  metrics:
57
+ - name: Accuracy
58
  type: accuracy
59
  value: 68.95
60
  ---
61
 
62
+ # Menda-3B-250: GRPO-Tuned Qwen2.5 Model
63
 
64
+ Menda-3B-250 is a fine-tuned version of Qwen2.5-3B-Instruct, trained with GRPO (Guided Reinforcement from Preference Optimization) for 250 steps. This model shows improved performance on reasoning benchmarks compared to the base model.
65
 
66
  ## Model Details
67
 
68
  - **Base Model**: Qwen/Qwen2.5-3B-Instruct
69
  - **Training Method**: GRPO (Guided Reinforcement from Preference Optimization)
70
  - **Training Steps**: 250
71
+ - **Parameters**: 3 billion
72
  - **Context Length**: 32K tokens
73
  - **Training Data**: GSM8K (mathematical reasoning)
74
+ - **Chat Template**: Uses the Qwen2 chat template
75
+
76
+ ## Chat Format
77
+
78
+ This model uses the standard Qwen2 chat template. For best results when using the model directly, format your prompts as follows:
79
+
80
+ ```
81
+ <|im_start|>system
82
+ You are a helpful AI assistant.<|im_end|>
83
+ <|im_start|>user
84
+ Your question here<|im_end|>
85
+ <|im_start|>assistant
86
+ ```
87
+
88
+ When using the model through the Hugging Face Transformers library, the chat template will be applied automatically when using the `chat_template` functionality:
89
+
90
+ ```python
91
+ from transformers import AutoModelForCausalLM, AutoTokenizer
92
+
93
+ model_name = "weathermanj/Menda-3B-250"
94
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
95
+ model = AutoModelForCausalLM.from_pretrained(model_name)
96
+
97
+ messages = [
98
+ {"role": "system", "content": "You are a helpful AI assistant."},
99
+ {"role": "user", "content": "Explain the concept of machine learning in simple terms."}
100
+ ]
101
 
102
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False)
103
+ inputs = tokenizer(prompt, return_tensors="pt")
104
+ outputs = model.generate(**inputs, max_length=300)
105
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
106
+ print(response)
107
+ ```
108
 
109
+ ## Benchmark Results
110
 
111
+ Menda-3B-250 has been evaluated on several standard benchmarks:
112
 
113
+ | Benchmark | Task Type | Accuracy |
114
+ |-----------|-----------|----------|
115
+ | ARC-Challenge | Scientific Reasoning | 50.0% |
116
+ | BoolQ | Reading Comprehension | 80.0% |
117
+ | HellaSwag | Common Sense Reasoning | 40.0% |
118
+ | Lambada | Text Completion | 70.0% |
119
+ | PIQA | Physical Reasoning | 90.0% |
120
+ | Winogrande | Commonsense Reasoning | 90.0% |
121
 
122
  ### MMLU Performance
123
 
 
136
  - **Efficient Training**: Achieves impressive results with minimal training (only 250 steps).
137
  - **Balanced Capabilities**: Maintains strong performance across diverse tasks without significant trade-offs.
138
 
139
+ ## Usage Examples
140
+
141
+ ### Basic Usage with Transformers
142
 
143
  ```python
144
  from transformers import AutoModelForCausalLM, AutoTokenizer
145
 
146
  model_name = "weathermanj/Menda-3B-250"
147
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
148
  model = AutoModelForCausalLM.from_pretrained(
149
  model_name,
150
  torch_dtype="auto",
151
  device_map="auto"
152
  )
153
+
154
+ prompt = "Explain the concept of machine learning in simple terms."
155
+ inputs = tokenizer(prompt, return_tensors="pt")
156
+ outputs = model.generate(**inputs, max_length=300)
157
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
158
+ print(response)
159
+ ```
160
+
161
+ ### Chat Usage with Transformers
162
+
163
+ ```python
164
+ from transformers import AutoModelForCausalLM, AutoTokenizer
165
+
166
+ model_name = "weathermanj/Menda-3B-250"
167
  tokenizer = AutoTokenizer.from_pretrained(model_name)
168
+ model = AutoModelForCausalLM.from_pretrained(
169
+ model_name,
170
+ torch_dtype="auto",
171
+ device_map="auto"
172
+ )
173
 
 
174
  messages = [
175
+ {"role": "system", "content": "You are a helpful AI assistant."},
176
+ {"role": "user", "content": "Give me a short introduction to large language models."}
177
  ]
178
  text = tokenizer.apply_chat_template(
179
  messages,
 
194
  print(response)
195
  ```
196
 
197
+ ### Using with Ollama
198
+
199
+ You can also use this model with Ollama by converting it to GGUF format:
200
+
201
+ ```bash
202
+ # Convert to GGUF
203
+ python -m llama_cpp.convert_hf_to_gguf weathermanj/Menda-3B-250 --outfile menda-3b-250.gguf
204
+
205
+ # Create Ollama model
206
+ cat > Modelfile << EOF
207
+ FROM menda-3b-250.gguf
208
+ TEMPLATE """{{ .Prompt }}"""
209
+ PARAMETER temperature 0.7
210
+ PARAMETER top_p 0.9
211
+ PARAMETER top_k 40
212
+ EOF
213
+
214
+ ollama create menda-3b-250 -f Modelfile
215
+ ollama run menda-3b-250
216
+ ```
217
+
218
  ## Training Configuration
219
 
220
  The model was trained using the GRPO methodology with the following configuration:
 
228
 
229
  ## License
230
 
231
+ This model inherits the license of the base Qwen2.5-3B-Instruct model. Please refer to the [Qwen2 license](https://huggingface.co/Qwen/Qwen2-3B-Instruct/blob/main/LICENSE) for details.