---
base_model: unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
- grpo
license: mit
language:
- en
datasets:
- openai/gsm8k
---


## Model Card for Azzedde/llama3.1-8b-reasoning-grpo

### Model Details
**Model Description**  
This is the model card for **llama3.1-8b-reasoning-grpo**, a fine-tuned version of Meta’s Llama-3.1-8B-Instruct, optimized for **complex reasoning and logical inference**. The model has been trained using **Unsloth** with **LoRA fine-tuning** and **vLLM for fast inference**, enabling enhanced performance on **structured logical tasks, multi-hop reasoning, and analytical problem-solving**.

**Developed by**: Azzedine (GitHub: Azzedde)  
**Funded by [optional]**: N/A  
**Shared by [optional]**: Azzedde  
**Model Type**: Large Language Model (LLM) optimized for reasoning tasks  
**Language(s) (NLP)**: English  
**License**: MIT  
**Finetuned from model [optional]**: Meta-Llama-3.1-8B-Instruct  

### Model Sources
**Repository**: [Hugging Face](https://huggingface.co/Azzedde/llama3.1-8b-reasoning-grpo)  
**Paper [optional]**: N/A  
**Demo [optional]**: N/A  

### Uses
#### Direct Use
This model is designed for **complex reasoning and logical inference** in:
- Analytical problem-solving
- Multi-step deduction
- Automated reasoning systems
- Advanced question-answering tasks

#### Downstream Use [optional]
- AI-driven **decision support systems**
- Enhancing **multi-step AI reasoning chains**
- Improving **LLM-based tutoring systems**

#### Out-of-Scope Use
- General NLP tasks unrelated to structured reasoning
- Tasks requiring high factual recall outside logical reasoning

### Bias, Risks, and Limitations
- The model may **hallucinate logical steps** when reasoning about **highly complex or ambiguous problems**.
- It does not possess **real-world factual accuracy**, meaning users should **verify logical conclusions**.
- The model's reasoning is **dependent on its fine-tuned dataset** and may require additional domain adaptation.

### Recommendations
Users should be aware of:
- The need to **validate logical outputs** against ground-truth sources.
- The potential for **biases in reasoning patterns**.
- The benefit of **fine-tuning on domain-specific reasoning datasets** for best performance.

### How to Get Started with the Model
Use the following code to load and use the model:

```python
from unsloth import FastLanguageModel
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Azzedde/llama3.1-8b-reasoning-grpo")
model = FastLanguageModel.from_pretrained("Azzedde/llama3.1-8b-reasoning-grpo")

# Example inference
reasoning_prompt = """Solve the following logical problem:

If all cats are mammals, and some mammals are not pets, does it follow that some cats are not pets? Explain your reasoning.
"""

inputs = tokenizer(reasoning_prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True)
print(tokenizer.decode(outputs[0]))
```

### Training Details
**Training Data**: The model was fine-tuned on a **custom reasoning dataset (2024v1)**.  
**Training Procedure**:
- **Preprocessing**: Tokenized using **structured logic templates**.
- **Training Hyperparameters**:
  - `batch_size=4`
  - `gradient_accumulation_steps=8`
  - `num_train_epochs=3`
  - `learning_rate=2e-4`
  - `fp16=True`

### Evaluation
#### Testing Data
- Used **structured reasoning datasets** from various logic-based tasks.

#### Factors
- Model performance was measured on **logical consistency and deductive accuracy**.

#### Metrics
- **Logical Entailment Accuracy** (LEA)
- **Stepwise Deduction Success Rate** (SDSR)

#### Results
- **High accuracy in single-hop reasoning tasks**.
- **Struggles with highly ambiguous logical chains**.

### Environmental Impact
**Hardware Type**: Tesla T4 (Google Colab)  
**Hours Used**: ~212 minutes  
**Cloud Provider**: Google Colab  
**Compute Region**: N/A  


### Technical Specifications
#### Model Architecture and Objective
- Based on **Llama-3.1 8B** with **LoRA fine-tuning** and **vLLM fast inference**.

#### Compute Infrastructure
- Fine-tuned using **Unsloth** for efficient training and inference.

#### Hardware
- **GPU**: Tesla T4  
- **Max Reserved Memory**: ~8 GB  

#### Software
- **Libraries Used**: `unsloth`, `transformers`, `TRL`, `datasets`  

### Citation [optional]
**BibTeX:**
```
@article{llama3.1-8b-grpo,
  author    = {Azzedde},
  title     = {Llama3.1-8B-GRPO: A Logical Reasoning LLM},
  year      = {2025},
  url       = {https://huggingface.co/Azzedde/llama3.1-8b-grpo}
}
```

**APA:**  
Azzedde. (2025). *Llama3.1-8B-GRPO: A Logical Reasoning LLM*. Retrieved from [Hugging Face](https://huggingface.co/Azzedde/llama3.1-8b-reasoning-grpo)  

### More Information
For questions, reach out via **Hugging Face discussions** or GitHub issues.

### Model Card Authors
- **Azzedde** (GitHub: Azzedde)

### Model Card Contact
**Contact**: [Hugging Face Profile](https://huggingface.co/Azzedde)