Model Card for Azzedde/llama3.1-8b-reasoning-grpo

Model Details

Model Description
This is the model card for llama3.1-8b-reasoning-grpo, a fine-tuned version of Meta’s Llama-3.1-8B-Instruct, optimized for complex reasoning and logical inference. The model has been trained using Unsloth with LoRA fine-tuning and vLLM for fast inference, enabling enhanced performance on structured logical tasks, multi-hop reasoning, and analytical problem-solving.

Developed by: Azzedine (GitHub: Azzedde)
Funded by [optional]: N/A
Shared by [optional]: Azzedde
Model Type: Large Language Model (LLM) optimized for reasoning tasks
Language(s) (NLP): English
License: MIT
Finetuned from model [optional]: Meta-Llama-3.1-8B-Instruct

Model Sources

Repository: Hugging Face
Paper [optional]: N/A
Demo [optional]: N/A

Uses

Direct Use

This model is designed for complex reasoning and logical inference in:

  • Analytical problem-solving
  • Multi-step deduction
  • Automated reasoning systems
  • Advanced question-answering tasks

Downstream Use [optional]

  • AI-driven decision support systems
  • Enhancing multi-step AI reasoning chains
  • Improving LLM-based tutoring systems

Out-of-Scope Use

  • General NLP tasks unrelated to structured reasoning
  • Tasks requiring high factual recall outside logical reasoning

Bias, Risks, and Limitations

  • The model may hallucinate logical steps when reasoning about highly complex or ambiguous problems.
  • It does not possess real-world factual accuracy, meaning users should verify logical conclusions.
  • The model's reasoning is dependent on its fine-tuned dataset and may require additional domain adaptation.

Recommendations

Users should be aware of:

  • The need to validate logical outputs against ground-truth sources.
  • The potential for biases in reasoning patterns.
  • The benefit of fine-tuning on domain-specific reasoning datasets for best performance.

How to Get Started with the Model

Use the following code to load and use the model:

from unsloth import FastLanguageModel
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Azzedde/llama3.1-8b-reasoning-grpo")
model = FastLanguageModel.from_pretrained("Azzedde/llama3.1-8b-reasoning-grpo")

# Example inference
reasoning_prompt = """Solve the following logical problem:

If all cats are mammals, and some mammals are not pets, does it follow that some cats are not pets? Explain your reasoning.
"""

inputs = tokenizer(reasoning_prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data: The model was fine-tuned on a custom reasoning dataset (2024v1).
Training Procedure:

  • Preprocessing: Tokenized using structured logic templates.
  • Training Hyperparameters:
    • batch_size=4
    • gradient_accumulation_steps=8
    • num_train_epochs=3
    • learning_rate=2e-4
    • fp16=True

Evaluation

Testing Data

  • Used structured reasoning datasets from various logic-based tasks.

Factors

  • Model performance was measured on logical consistency and deductive accuracy.

Metrics

  • Logical Entailment Accuracy (LEA)
  • Stepwise Deduction Success Rate (SDSR)

Results

  • High accuracy in single-hop reasoning tasks.
  • Struggles with highly ambiguous logical chains.

Environmental Impact

Hardware Type: Tesla T4 (Google Colab)
Hours Used: ~212 minutes
Cloud Provider: Google Colab
Compute Region: N/A

Technical Specifications

Model Architecture and Objective

  • Based on Llama-3.1 8B with LoRA fine-tuning and vLLM fast inference.

Compute Infrastructure

  • Fine-tuned using Unsloth for efficient training and inference.

Hardware

  • GPU: Tesla T4
  • Max Reserved Memory: ~8 GB

Software

  • Libraries Used: unsloth, transformers, TRL, datasets

Citation [optional]

BibTeX:

@article{llama3.1-8b-grpo,
  author    = {Azzedde},
  title     = {Llama3.1-8B-GRPO: A Logical Reasoning LLM},
  year      = {2025},
  url       = {https://huggingface.co/Azzedde/llama3.1-8b-grpo}
}

APA:
Azzedde. (2025). Llama3.1-8B-GRPO: A Logical Reasoning LLM. Retrieved from Hugging Face

More Information

For questions, reach out via Hugging Face discussions or GitHub issues.

Model Card Authors

  • Azzedde (GitHub: Azzedde)

Model Card Contact

Contact: Hugging Face Profile

Downloads last month
17
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train Azzedde/llama3.1-8b-reasoning-grpo