Azzedde commited on
Commit
001bb8f
·
verified ·
1 Parent(s): 334ec97

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +139 -7
README.md CHANGED
@@ -7,17 +7,149 @@ tags:
7
  - llama
8
  - trl
9
  - grpo
10
- license: apache-2.0
11
  language:
12
  - en
 
 
13
  ---
14
 
15
- # Uploaded model
16
 
17
- - **Developed by:** Azzedde
18
- - **License:** apache-2.0
19
- - **Finetuned from model :** unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit
20
 
21
- This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
7
  - llama
8
  - trl
9
  - grpo
10
+ license: mit
11
  language:
12
  - en
13
+ datasets:
14
+ - openai/gsm8k
15
  ---
16
 
 
17
 
18
+ ## Model Card for Azzedde/llama3.1-8b-reasoning-grpo
 
 
19
 
20
+ ### Model Details
21
+ **Model Description**
22
+ This is the model card for **llama3.1-8b-reasoning-grpo**, a fine-tuned version of Meta’s Llama-3.1-8B-Instruct, optimized for **complex reasoning and logical inference**. The model has been trained using **Unsloth** with **LoRA fine-tuning** and **vLLM for fast inference**, enabling enhanced performance on **structured logical tasks, multi-hop reasoning, and analytical problem-solving**.
23
+
24
+ **Developed by**: Azzedine (GitHub: Azzedde)
25
+ **Funded by [optional]**: N/A
26
+ **Shared by [optional]**: Azzedde
27
+ **Model Type**: Large Language Model (LLM) optimized for reasoning tasks
28
+ **Language(s) (NLP)**: English
29
+ **License**: MIT
30
+ **Finetuned from model [optional]**: Meta-Llama-3.1-8B-Instruct
31
+
32
+ ### Model Sources
33
+ **Repository**: [Hugging Face](https://huggingface.co/Azzedde/llama3.1-8b-reasoning-grpo)
34
+ **Paper [optional]**: N/A
35
+ **Demo [optional]**: N/A
36
+
37
+ ### Uses
38
+ #### Direct Use
39
+ This model is designed for **complex reasoning and logical inference** in:
40
+ - Analytical problem-solving
41
+ - Multi-step deduction
42
+ - Automated reasoning systems
43
+ - Advanced question-answering tasks
44
+
45
+ #### Downstream Use [optional]
46
+ - AI-driven **decision support systems**
47
+ - Enhancing **multi-step AI reasoning chains**
48
+ - Improving **LLM-based tutoring systems**
49
+
50
+ #### Out-of-Scope Use
51
+ - General NLP tasks unrelated to structured reasoning
52
+ - Tasks requiring high factual recall outside logical reasoning
53
+
54
+ ### Bias, Risks, and Limitations
55
+ - The model may **hallucinate logical steps** when reasoning about **highly complex or ambiguous problems**.
56
+ - It does not possess **real-world factual accuracy**, meaning users should **verify logical conclusions**.
57
+ - The model's reasoning is **dependent on its fine-tuned dataset** and may require additional domain adaptation.
58
+
59
+ ### Recommendations
60
+ Users should be aware of:
61
+ - The need to **validate logical outputs** against ground-truth sources.
62
+ - The potential for **biases in reasoning patterns**.
63
+ - The benefit of **fine-tuning on domain-specific reasoning datasets** for best performance.
64
+
65
+ ### How to Get Started with the Model
66
+ Use the following code to load and use the model:
67
+
68
+ ```python
69
+ from unsloth import FastLanguageModel
70
+ from transformers import AutoTokenizer
71
+
72
+ tokenizer = AutoTokenizer.from_pretrained("Azzedde/llama3.1-8b-reasoning-grpo")
73
+ model = FastLanguageModel.from_pretrained("Azzedde/llama3.1-8b-reasoning-grpo")
74
+
75
+ # Example inference
76
+ reasoning_prompt = """Solve the following logical problem:
77
+
78
+ If all cats are mammals, and some mammals are not pets, does it follow that some cats are not pets? Explain your reasoning.
79
+ """
80
+
81
+ inputs = tokenizer(reasoning_prompt, return_tensors="pt").to("cuda")
82
+ outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True)
83
+ print(tokenizer.decode(outputs[0]))
84
+ ```
85
+
86
+ ### Training Details
87
+ **Training Data**: The model was fine-tuned on a **custom reasoning dataset (2024v1)**.
88
+ **Training Procedure**:
89
+ - **Preprocessing**: Tokenized using **structured logic templates**.
90
+ - **Training Hyperparameters**:
91
+ - `batch_size=4`
92
+ - `gradient_accumulation_steps=8`
93
+ - `num_train_epochs=3`
94
+ - `learning_rate=2e-4`
95
+ - `fp16=True`
96
+
97
+ ### Evaluation
98
+ #### Testing Data
99
+ - Used **structured reasoning datasets** from various logic-based tasks.
100
+
101
+ #### Factors
102
+ - Model performance was measured on **logical consistency and deductive accuracy**.
103
+
104
+ #### Metrics
105
+ - **Logical Entailment Accuracy** (LEA)
106
+ - **Stepwise Deduction Success Rate** (SDSR)
107
+
108
+ #### Results
109
+ - **High accuracy in single-hop reasoning tasks**.
110
+ - **Struggles with highly ambiguous logical chains**.
111
+
112
+ ### Environmental Impact
113
+ **Hardware Type**: Tesla T4 (Google Colab)
114
+ **Hours Used**: ~212 minutes
115
+ **Cloud Provider**: Google Colab
116
+ **Compute Region**: N/A
117
+
118
+
119
+ ### Technical Specifications
120
+ #### Model Architecture and Objective
121
+ - Based on **Llama-3.1 8B** with **LoRA fine-tuning** and **vLLM fast inference**.
122
+
123
+ #### Compute Infrastructure
124
+ - Fine-tuned using **Unsloth** for efficient training and inference.
125
+
126
+ #### Hardware
127
+ - **GPU**: Tesla T4
128
+ - **Max Reserved Memory**: ~8 GB
129
+
130
+ #### Software
131
+ - **Libraries Used**: `unsloth`, `transformers`, `TRL`, `datasets`
132
+
133
+ ### Citation [optional]
134
+ **BibTeX:**
135
+ ```
136
+ @article{llama3.1-8b-grpo,
137
+ author = {Azzedde},
138
+ title = {Llama3.1-8B-GRPO: A Logical Reasoning LLM},
139
+ year = {2025},
140
+ url = {https://huggingface.co/Azzedde/llama3.1-8b-grpo}
141
+ }
142
+ ```
143
+
144
+ **APA:**
145
+ Azzedde. (2025). *Llama3.1-8B-GRPO: A Logical Reasoning LLM*. Retrieved from [Hugging Face](https://huggingface.co/Azzedde/llama3.1-8b-reasoning-grpo)
146
+
147
+ ### More Information
148
+ For questions, reach out via **Hugging Face discussions** or GitHub issues.
149
+
150
+ ### Model Card Authors
151
+ - **Azzedde** (GitHub: Azzedde)
152
+
153
+ ### Model Card Contact
154
+ **Contact**: [Hugging Face Profile](https://huggingface.co/Azzedde)
155