Azzedde
/

llama3.1-8b-reasoning-grpo

@@ -7,17 +7,149 @@ tags:
 - llama
 - trl
 - grpo
-license: apache-2.0
 language:
 - en
 ---
-# Uploaded  model
-- **Developed by:** Azzedde
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit
-This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 - llama
 - trl
 - grpo
+license: mit
 language:
 - en
+datasets:
+- openai/gsm8k
 ---
+## Model Card for Azzedde/llama3.1-8b-reasoning-grpo
+### Model Details
+**Model Description**
+This is the model card for **llama3.1-8b-reasoning-grpo**, a fine-tuned version of Meta’s Llama-3.1-8B-Instruct, optimized for **complex reasoning and logical inference**. The model has been trained using **Unsloth** with **LoRA fine-tuning** and **vLLM for fast inference**, enabling enhanced performance on **structured logical tasks, multi-hop reasoning, and analytical problem-solving**.
+**Developed by**: Azzedine (GitHub: Azzedde)
+**Funded by [optional]**: N/A
+**Shared by [optional]**: Azzedde
+**Model Type**: Large Language Model (LLM) optimized for reasoning tasks
+**Language(s) (NLP)**: English
+**License**: MIT
+**Finetuned from model [optional]**: Meta-Llama-3.1-8B-Instruct
+### Model Sources
+**Repository**: [Hugging Face](https://huggingface.co/Azzedde/llama3.1-8b-reasoning-grpo)
+**Paper [optional]**: N/A
+**Demo [optional]**: N/A
+### Uses
+#### Direct Use
+This model is designed for **complex reasoning and logical inference** in:
+- Analytical problem-solving
+- Multi-step deduction
+- Automated reasoning systems
+- Advanced question-answering tasks
+#### Downstream Use [optional]
+- AI-driven **decision support systems**
+- Enhancing **multi-step AI reasoning chains**
+- Improving **LLM-based tutoring systems**
+#### Out-of-Scope Use
+- General NLP tasks unrelated to structured reasoning
+- Tasks requiring high factual recall outside logical reasoning
+### Bias, Risks, and Limitations
+- The model may **hallucinate logical steps** when reasoning about **highly complex or ambiguous problems**.
+- It does not possess **real-world factual accuracy**, meaning users should **verify logical conclusions**.
+- The model's reasoning is **dependent on its fine-tuned dataset** and may require additional domain adaptation.
+### Recommendations
+Users should be aware of:
+- The need to **validate logical outputs** against ground-truth sources.
+- The potential for **biases in reasoning patterns**.
+- The benefit of **fine-tuning on domain-specific reasoning datasets** for best performance.
+### How to Get Started with the Model
+Use the following code to load and use the model:
+```python
+from unsloth import FastLanguageModel
+from transformers import AutoTokenizer
+tokenizer = AutoTokenizer.from_pretrained("Azzedde/llama3.1-8b-reasoning-grpo")
+model = FastLanguageModel.from_pretrained("Azzedde/llama3.1-8b-reasoning-grpo")
+# Example inference
+reasoning_prompt = """Solve the following logical problem:
+If all cats are mammals, and some mammals are not pets, does it follow that some cats are not pets? Explain your reasoning.
+"""
+inputs = tokenizer(reasoning_prompt, return_tensors="pt").to("cuda")
+outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True)
+print(tokenizer.decode(outputs[0]))
+```
+### Training Details
+**Training Data**: The model was fine-tuned on a **custom reasoning dataset (2024v1)**.
+**Training Procedure**:
+- **Preprocessing**: Tokenized using **structured logic templates**.
+- **Training Hyperparameters**:
+  - `batch_size=4`
+  - `gradient_accumulation_steps=8`
+  - `num_train_epochs=3`
+  - `learning_rate=2e-4`
+  - `fp16=True`
+### Evaluation
+#### Testing Data
+- Used **structured reasoning datasets** from various logic-based tasks.
+#### Factors
+- Model performance was measured on **logical consistency and deductive accuracy**.
+#### Metrics
+- **Logical Entailment Accuracy** (LEA)
+- **Stepwise Deduction Success Rate** (SDSR)
+#### Results
+- **High accuracy in single-hop reasoning tasks**.
+- **Struggles with highly ambiguous logical chains**.
+### Environmental Impact
+**Hardware Type**: Tesla T4 (Google Colab)
+**Hours Used**: ~212 minutes
+**Cloud Provider**: Google Colab
+**Compute Region**: N/A
+### Technical Specifications
+#### Model Architecture and Objective
+- Based on **Llama-3.1 8B** with **LoRA fine-tuning** and **vLLM fast inference**.
+#### Compute Infrastructure
+- Fine-tuned using **Unsloth** for efficient training and inference.
+#### Hardware
+- **GPU**: Tesla T4
+- **Max Reserved Memory**: ~8 GB
+#### Software
+- **Libraries Used**: `unsloth`, `transformers`, `TRL`, `datasets`
+### Citation [optional]
+**BibTeX:**
+```
+@article{llama3.1-8b-grpo,
+  author    = {Azzedde},
+  title     = {Llama3.1-8B-GRPO: A Logical Reasoning LLM},
+  year      = {2025},
+  url       = {https://huggingface.co/Azzedde/llama3.1-8b-grpo}
+}
+```
+**APA:**
+Azzedde. (2025). *Llama3.1-8B-GRPO: A Logical Reasoning LLM*. Retrieved from [Hugging Face](https://huggingface.co/Azzedde/llama3.1-8b-reasoning-grpo)
+### More Information
+For questions, reach out via **Hugging Face discussions** or GitHub issues.
+### Model Card Authors
+- **Azzedde** (GitHub: Azzedde)
+### Model Card Contact
+**Contact**: [Hugging Face Profile](https://huggingface.co/Azzedde)