Update README.md
Browse files
README.md
CHANGED
@@ -7,17 +7,149 @@ tags:
|
|
7 |
- llama
|
8 |
- trl
|
9 |
- grpo
|
10 |
-
license:
|
11 |
language:
|
12 |
- en
|
|
|
|
|
13 |
---
|
14 |
|
15 |
-
# Uploaded model
|
16 |
|
17 |
-
|
18 |
-
- **License:** apache-2.0
|
19 |
-
- **Finetuned from model :** unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit
|
20 |
|
21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
-
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
|
|
7 |
- llama
|
8 |
- trl
|
9 |
- grpo
|
10 |
+
license: mit
|
11 |
language:
|
12 |
- en
|
13 |
+
datasets:
|
14 |
+
- openai/gsm8k
|
15 |
---
|
16 |
|
|
|
17 |
|
18 |
+
## Model Card for Azzedde/llama3.1-8b-reasoning-grpo
|
|
|
|
|
19 |
|
20 |
+
### Model Details
|
21 |
+
**Model Description**
|
22 |
+
This is the model card for **llama3.1-8b-reasoning-grpo**, a fine-tuned version of Meta’s Llama-3.1-8B-Instruct, optimized for **complex reasoning and logical inference**. The model has been trained using **Unsloth** with **LoRA fine-tuning** and **vLLM for fast inference**, enabling enhanced performance on **structured logical tasks, multi-hop reasoning, and analytical problem-solving**.
|
23 |
+
|
24 |
+
**Developed by**: Azzedine (GitHub: Azzedde)
|
25 |
+
**Funded by [optional]**: N/A
|
26 |
+
**Shared by [optional]**: Azzedde
|
27 |
+
**Model Type**: Large Language Model (LLM) optimized for reasoning tasks
|
28 |
+
**Language(s) (NLP)**: English
|
29 |
+
**License**: MIT
|
30 |
+
**Finetuned from model [optional]**: Meta-Llama-3.1-8B-Instruct
|
31 |
+
|
32 |
+
### Model Sources
|
33 |
+
**Repository**: [Hugging Face](https://huggingface.co/Azzedde/llama3.1-8b-reasoning-grpo)
|
34 |
+
**Paper [optional]**: N/A
|
35 |
+
**Demo [optional]**: N/A
|
36 |
+
|
37 |
+
### Uses
|
38 |
+
#### Direct Use
|
39 |
+
This model is designed for **complex reasoning and logical inference** in:
|
40 |
+
- Analytical problem-solving
|
41 |
+
- Multi-step deduction
|
42 |
+
- Automated reasoning systems
|
43 |
+
- Advanced question-answering tasks
|
44 |
+
|
45 |
+
#### Downstream Use [optional]
|
46 |
+
- AI-driven **decision support systems**
|
47 |
+
- Enhancing **multi-step AI reasoning chains**
|
48 |
+
- Improving **LLM-based tutoring systems**
|
49 |
+
|
50 |
+
#### Out-of-Scope Use
|
51 |
+
- General NLP tasks unrelated to structured reasoning
|
52 |
+
- Tasks requiring high factual recall outside logical reasoning
|
53 |
+
|
54 |
+
### Bias, Risks, and Limitations
|
55 |
+
- The model may **hallucinate logical steps** when reasoning about **highly complex or ambiguous problems**.
|
56 |
+
- It does not possess **real-world factual accuracy**, meaning users should **verify logical conclusions**.
|
57 |
+
- The model's reasoning is **dependent on its fine-tuned dataset** and may require additional domain adaptation.
|
58 |
+
|
59 |
+
### Recommendations
|
60 |
+
Users should be aware of:
|
61 |
+
- The need to **validate logical outputs** against ground-truth sources.
|
62 |
+
- The potential for **biases in reasoning patterns**.
|
63 |
+
- The benefit of **fine-tuning on domain-specific reasoning datasets** for best performance.
|
64 |
+
|
65 |
+
### How to Get Started with the Model
|
66 |
+
Use the following code to load and use the model:
|
67 |
+
|
68 |
+
```python
|
69 |
+
from unsloth import FastLanguageModel
|
70 |
+
from transformers import AutoTokenizer
|
71 |
+
|
72 |
+
tokenizer = AutoTokenizer.from_pretrained("Azzedde/llama3.1-8b-reasoning-grpo")
|
73 |
+
model = FastLanguageModel.from_pretrained("Azzedde/llama3.1-8b-reasoning-grpo")
|
74 |
+
|
75 |
+
# Example inference
|
76 |
+
reasoning_prompt = """Solve the following logical problem:
|
77 |
+
|
78 |
+
If all cats are mammals, and some mammals are not pets, does it follow that some cats are not pets? Explain your reasoning.
|
79 |
+
"""
|
80 |
+
|
81 |
+
inputs = tokenizer(reasoning_prompt, return_tensors="pt").to("cuda")
|
82 |
+
outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True)
|
83 |
+
print(tokenizer.decode(outputs[0]))
|
84 |
+
```
|
85 |
+
|
86 |
+
### Training Details
|
87 |
+
**Training Data**: The model was fine-tuned on a **custom reasoning dataset (2024v1)**.
|
88 |
+
**Training Procedure**:
|
89 |
+
- **Preprocessing**: Tokenized using **structured logic templates**.
|
90 |
+
- **Training Hyperparameters**:
|
91 |
+
- `batch_size=4`
|
92 |
+
- `gradient_accumulation_steps=8`
|
93 |
+
- `num_train_epochs=3`
|
94 |
+
- `learning_rate=2e-4`
|
95 |
+
- `fp16=True`
|
96 |
+
|
97 |
+
### Evaluation
|
98 |
+
#### Testing Data
|
99 |
+
- Used **structured reasoning datasets** from various logic-based tasks.
|
100 |
+
|
101 |
+
#### Factors
|
102 |
+
- Model performance was measured on **logical consistency and deductive accuracy**.
|
103 |
+
|
104 |
+
#### Metrics
|
105 |
+
- **Logical Entailment Accuracy** (LEA)
|
106 |
+
- **Stepwise Deduction Success Rate** (SDSR)
|
107 |
+
|
108 |
+
#### Results
|
109 |
+
- **High accuracy in single-hop reasoning tasks**.
|
110 |
+
- **Struggles with highly ambiguous logical chains**.
|
111 |
+
|
112 |
+
### Environmental Impact
|
113 |
+
**Hardware Type**: Tesla T4 (Google Colab)
|
114 |
+
**Hours Used**: ~212 minutes
|
115 |
+
**Cloud Provider**: Google Colab
|
116 |
+
**Compute Region**: N/A
|
117 |
+
|
118 |
+
|
119 |
+
### Technical Specifications
|
120 |
+
#### Model Architecture and Objective
|
121 |
+
- Based on **Llama-3.1 8B** with **LoRA fine-tuning** and **vLLM fast inference**.
|
122 |
+
|
123 |
+
#### Compute Infrastructure
|
124 |
+
- Fine-tuned using **Unsloth** for efficient training and inference.
|
125 |
+
|
126 |
+
#### Hardware
|
127 |
+
- **GPU**: Tesla T4
|
128 |
+
- **Max Reserved Memory**: ~8 GB
|
129 |
+
|
130 |
+
#### Software
|
131 |
+
- **Libraries Used**: `unsloth`, `transformers`, `TRL`, `datasets`
|
132 |
+
|
133 |
+
### Citation [optional]
|
134 |
+
**BibTeX:**
|
135 |
+
```
|
136 |
+
@article{llama3.1-8b-grpo,
|
137 |
+
author = {Azzedde},
|
138 |
+
title = {Llama3.1-8B-GRPO: A Logical Reasoning LLM},
|
139 |
+
year = {2025},
|
140 |
+
url = {https://huggingface.co/Azzedde/llama3.1-8b-grpo}
|
141 |
+
}
|
142 |
+
```
|
143 |
+
|
144 |
+
**APA:**
|
145 |
+
Azzedde. (2025). *Llama3.1-8B-GRPO: A Logical Reasoning LLM*. Retrieved from [Hugging Face](https://huggingface.co/Azzedde/llama3.1-8b-reasoning-grpo)
|
146 |
+
|
147 |
+
### More Information
|
148 |
+
For questions, reach out via **Hugging Face discussions** or GitHub issues.
|
149 |
+
|
150 |
+
### Model Card Authors
|
151 |
+
- **Azzedde** (GitHub: Azzedde)
|
152 |
+
|
153 |
+
### Model Card Contact
|
154 |
+
**Contact**: [Hugging Face Profile](https://huggingface.co/Azzedde)
|
155 |
|
|