Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +119 -53

README.md CHANGED Viewed

@@ -1,98 +1,123 @@
 ---
-language:
-  - en
 license: other
-base_model: Qwen/Qwen2.5-3B-Instruct
 tags:
   - qwen
   - grpo
-  - reinforcement-learning
-  - instruction-tuning
-  - mathematical-reasoning
-  - gsm8k
 datasets:
   - gsm8k
 model-index:
   - name: Menda-3B-250
     results:
       - task:
-          type: multiple-choice-qa
           name: ARC-Challenge
         metrics:
           - name: Accuracy
             type: accuracy
             value: 50.0
       - task:
-          type: multiple-choice-qa
           name: BoolQ
         metrics:
           - name: Accuracy
             type: accuracy
             value: 80.0
       - task:
-          type: multiple-choice-qa
           name: HellaSwag
         metrics:
           - name: Accuracy
             type: accuracy
             value: 40.0
       - task:
-          type: multiple-choice-qa
-          name: Lambada
-        metrics:
-          - name: Accuracy
-            type: accuracy
-            value: 70.0
-      - task:
-          type: multiple-choice-qa
-          name: PIQA
-        metrics:
-          - name: Accuracy
-            type: accuracy
-            value: 90.0
-      - task:
-          type: multiple-choice-qa
-          name: Winogrande
-        metrics:
-          - name: Accuracy
-            type: accuracy
-            value: 90.0
-      - task:
           type: mmlu
-          name: MMLU
         metrics:
-          - name: Average
             type: accuracy
             value: 68.95
 ---
-# Menda-3B-250
-Menda-3B-250 is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) using Guided Reinforcement from Preference Optimization (GRPO). This model represents the 250-step checkpoint from the training process.
 ## Model Details
 - **Base Model**: Qwen/Qwen2.5-3B-Instruct
 - **Training Method**: GRPO (Guided Reinforcement from Preference Optimization)
 - **Training Steps**: 250
-- **Parameters**: 3B
 - **Context Length**: 32K tokens
 - **Training Data**: GSM8K (mathematical reasoning)
-## Performance
-Based on extensive evaluation, the 250-step checkpoint shows surprisingly strong performance across multiple benchmarks:
-### Core Benchmarks (0-shot)
-| Benchmark | Score |
-|-----------|-------|
-| ARC-Challenge | 50.0% |
-| BoolQ | 80.0% |
-| HellaSwag | 40.0% |
-| Lambada | 70.0% |
-| PIQA | 90.0% |
-| Winogrande | 90.0% |
 ### MMLU Performance
@@ -111,24 +136,44 @@ Based on extensive evaluation, the 250-step checkpoint shows surprisingly strong
 - **Efficient Training**: Achieves impressive results with minimal training (only 250 steps).
 - **Balanced Capabilities**: Maintains strong performance across diverse tasks without significant trade-offs.
-## Usage
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "weathermanj/Menda-3B-250"
 model = AutoModelForCausalLM.from_pretrained(
     model_name,
     torch_dtype="auto",
     device_map="auto"
 )
 tokenizer = AutoTokenizer.from_pretrained(model_name)
-prompt = "Give me a short introduction to large language models."
 messages = [
-    {"role": "system", "content": "You are a helpful assistant."},
-    {"role": "user", "content": prompt}
 ]
 text = tokenizer.apply_chat_template(
     messages,
@@ -149,6 +194,27 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response)
 ```
 ## Training Configuration
 The model was trained using the GRPO methodology with the following configuration:
@@ -162,4 +228,4 @@ The model was trained using the GRPO methodology with the following configuratio
 ## License
-This model is subject to the license of the original Qwen2.5-3B-Instruct model.

 ---
+language: en
 license: other
 tags:
   - qwen
   - grpo
+  - instruct
+  - fine-tuned
+  - reasoning
+  - 3b
+  - menda
+  - chat
+  - transformers
+library_name: transformers
 datasets:
   - gsm8k
 model-index:
   - name: Menda-3B-250
     results:
       - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          type: arc-challenge
           name: ARC-Challenge
         metrics:
           - name: Accuracy
             type: accuracy
             value: 50.0
       - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          type: boolq
           name: BoolQ
         metrics:
           - name: Accuracy
             type: accuracy
             value: 80.0
       - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          type: hellaswag
           name: HellaSwag
         metrics:
           - name: Accuracy
             type: accuracy
             value: 40.0
       - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
           type: mmlu
+          name: MMLU (Overall)
         metrics:
+          - name: Accuracy
             type: accuracy
             value: 68.95
 ---
+# Menda-3B-250: GRPO-Tuned Qwen2.5 Model
+Menda-3B-250 is a fine-tuned version of Qwen2.5-3B-Instruct, trained with GRPO (Guided Reinforcement from Preference Optimization) for 250 steps. This model shows improved performance on reasoning benchmarks compared to the base model.
 ## Model Details
 - **Base Model**: Qwen/Qwen2.5-3B-Instruct
 - **Training Method**: GRPO (Guided Reinforcement from Preference Optimization)
 - **Training Steps**: 250
+- **Parameters**: 3 billion
 - **Context Length**: 32K tokens
 - **Training Data**: GSM8K (mathematical reasoning)
+- **Chat Template**: Uses the Qwen2 chat template
+## Chat Format
+This model uses the standard Qwen2 chat template. For best results when using the model directly, format your prompts as follows:
+```
+<|im_start|>system
+You are a helpful AI assistant.<|im_end|>
+<|im_start|>user
+Your question here<|im_end|>
+<|im_start|>assistant
+```
+When using the model through the Hugging Face Transformers library, the chat template will be applied automatically when using the `chat_template` functionality:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "weathermanj/Menda-3B-250"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+messages = [
+    {"role": "system", "content": "You are a helpful AI assistant."},
+    {"role": "user", "content": "Explain the concept of machine learning in simple terms."}
+]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False)
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=300)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+## Benchmark Results
+Menda-3B-250 has been evaluated on several standard benchmarks:
+| Benchmark | Task Type | Accuracy |
+|-----------|-----------|----------|
+| ARC-Challenge | Scientific Reasoning | 50.0% |
+| BoolQ | Reading Comprehension | 80.0% |
+| HellaSwag | Common Sense Reasoning | 40.0% |
+| Lambada | Text Completion | 70.0% |
+| PIQA | Physical Reasoning | 90.0% |
+| Winogrande | Commonsense Reasoning | 90.0% |
 ### MMLU Performance
 - **Efficient Training**: Achieves impressive results with minimal training (only 250 steps).
 - **Balanced Capabilities**: Maintains strong performance across diverse tasks without significant trade-offs.
+## Usage Examples
+### Basic Usage with Transformers
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "weathermanj/Menda-3B-250"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForCausalLM.from_pretrained(
     model_name,
     torch_dtype="auto",
     device_map="auto"
 )
+prompt = "Explain the concept of machine learning in simple terms."
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=300)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+### Chat Usage with Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "weathermanj/Menda-3B-250"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
 messages = [
+    {"role": "system", "content": "You are a helpful AI assistant."},
+    {"role": "user", "content": "Give me a short introduction to large language models."}
 ]
 text = tokenizer.apply_chat_template(
     messages,
 print(response)
 ```
+### Using with Ollama
+You can also use this model with Ollama by converting it to GGUF format:
+```bash
+# Convert to GGUF
+python -m llama_cpp.convert_hf_to_gguf weathermanj/Menda-3B-250 --outfile menda-3b-250.gguf
+# Create Ollama model
+cat > Modelfile << EOF
+FROM menda-3b-250.gguf
+TEMPLATE """{{ .Prompt }}"""
+PARAMETER temperature 0.7
+PARAMETER top_p 0.9
+PARAMETER top_k 40
+EOF
+ollama create menda-3b-250 -f Modelfile
+ollama run menda-3b-250
+```
 ## Training Configuration
 The model was trained using the GRPO methodology with the following configuration:
 ## License
+This model inherits the license of the base Qwen2.5-3B-Instruct model. Please refer to the [Qwen2 license](https://huggingface.co/Qwen/Qwen2-3B-Instruct/blob/main/LICENSE) for details.