Azzedde/llama3.1-8b-reasoning-grpo

Model Card for Azzedde/llama3.1-8b-reasoning-grpo

Model Details

Model Description
This is the model card for llama3.1-8b-reasoning-grpo, a fine-tuned version of Meta’s Llama-3.1-8B-Instruct, optimized for complex reasoning and logical inference. The model has been trained using Unsloth with LoRA fine-tuning and vLLM for fast inference, enabling enhanced performance on structured logical tasks, multi-hop reasoning, and analytical problem-solving.

Developed by: Azzedine (GitHub: Azzedde)
Funded by [optional]: N/A
Shared by [optional]: Azzedde
Model Type: Large Language Model (LLM) optimized for reasoning tasks
Language(s) (NLP): English
License: MIT
Finetuned from model [optional]: Meta-Llama-3.1-8B-Instruct

Model Sources

Repository: Hugging Face
Paper [optional]: N/A
Demo [optional]: N/A

Uses

Direct Use

This model is designed for complex reasoning and logical inference in:

Analytical problem-solving
Multi-step deduction
Automated reasoning systems
Advanced question-answering tasks

Downstream Use [optional]

AI-driven decision support systems
Enhancing multi-step AI reasoning chains
Improving LLM-based tutoring systems

Out-of-Scope Use

General NLP tasks unrelated to structured reasoning
Tasks requiring high factual recall outside logical reasoning

Bias, Risks, and Limitations

The model may hallucinate logical steps when reasoning about highly complex or ambiguous problems.
It does not possess real-world factual accuracy, meaning users should verify logical conclusions.
The model's reasoning is dependent on its fine-tuned dataset and may require additional domain adaptation.

Recommendations

Users should be aware of:

The need to validate logical outputs against ground-truth sources.
The potential for biases in reasoning patterns.
The benefit of fine-tuning on domain-specific reasoning datasets for best performance.

How to Get Started with the Model

Use the following code to load and use the model:

from unsloth import FastLanguageModel
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Azzedde/llama3.1-8b-reasoning-grpo")
model = FastLanguageModel.from_pretrained("Azzedde/llama3.1-8b-reasoning-grpo")

# Example inference
reasoning_prompt = """Solve the following logical problem:

If all cats are mammals, and some mammals are not pets, does it follow that some cats are not pets? Explain your reasoning.
"""

inputs = tokenizer(reasoning_prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data: The model was fine-tuned on a custom reasoning dataset (2024v1).
Training Procedure:

Preprocessing: Tokenized using structured logic templates.
Training Hyperparameters:
- batch_size=4
- gradient_accumulation_steps=8
- num_train_epochs=3
- learning_rate=2e-4
- fp16=True

Evaluation

Testing Data

Used structured reasoning datasets from various logic-based tasks.

Factors

Model performance was measured on logical consistency and deductive accuracy.

Metrics

Logical Entailment Accuracy (LEA)
Stepwise Deduction Success Rate (SDSR)

Results

High accuracy in single-hop reasoning tasks.
Struggles with highly ambiguous logical chains.

Environmental Impact

Hardware Type: Tesla T4 (Google Colab)
Hours Used: ~212 minutes
Cloud Provider: Google Colab
Compute Region: N/A

Technical Specifications

Model Architecture and Objective

Based on Llama-3.1 8B with LoRA fine-tuning and vLLM fast inference.

Compute Infrastructure

Fine-tuned using Unsloth for efficient training and inference.

Hardware

GPU: Tesla T4
Max Reserved Memory: ~8 GB

Software

Libraries Used: unsloth, transformers, TRL, datasets

Citation [optional]

BibTeX:

@article{llama3.1-8b-grpo,
  author    = {Azzedde},
  title     = {Llama3.1-8B-GRPO: A Logical Reasoning LLM},
  year      = {2025},
  url       = {https://huggingface.co/Azzedde/llama3.1-8b-grpo}
}

APA:
Azzedde. (2025). Llama3.1-8B-GRPO: A Logical Reasoning LLM. Retrieved from Hugging Face

More Information

For questions, reach out via Hugging Face discussions or GitHub issues.

Model Card Authors

Azzedde (GitHub: Azzedde)

Model Card Contact

Contact: Hugging Face Profile

Azzedde
/

llama3.1-8b-reasoning-grpo