Model Card for Azzedde/llama3.1-8b-reasoning-grpo
Model Details
Model Description
This is the model card for llama3.1-8b-reasoning-grpo, a fine-tuned version of Meta’s Llama-3.1-8B-Instruct, optimized for complex reasoning and logical inference. The model has been trained using Unsloth with LoRA fine-tuning and vLLM for fast inference, enabling enhanced performance on structured logical tasks, multi-hop reasoning, and analytical problem-solving.
Developed by: Azzedine (GitHub: Azzedde)
Funded by [optional]: N/A
Shared by [optional]: Azzedde
Model Type: Large Language Model (LLM) optimized for reasoning tasks
Language(s) (NLP): English
License: MIT
Finetuned from model [optional]: Meta-Llama-3.1-8B-Instruct
Model Sources
Repository: Hugging Face
Paper [optional]: N/A
Demo [optional]: N/A
Uses
Direct Use
This model is designed for complex reasoning and logical inference in:
- Analytical problem-solving
- Multi-step deduction
- Automated reasoning systems
- Advanced question-answering tasks
Downstream Use [optional]
- AI-driven decision support systems
- Enhancing multi-step AI reasoning chains
- Improving LLM-based tutoring systems
Out-of-Scope Use
- General NLP tasks unrelated to structured reasoning
- Tasks requiring high factual recall outside logical reasoning
Bias, Risks, and Limitations
- The model may hallucinate logical steps when reasoning about highly complex or ambiguous problems.
- It does not possess real-world factual accuracy, meaning users should verify logical conclusions.
- The model's reasoning is dependent on its fine-tuned dataset and may require additional domain adaptation.
Recommendations
Users should be aware of:
- The need to validate logical outputs against ground-truth sources.
- The potential for biases in reasoning patterns.
- The benefit of fine-tuning on domain-specific reasoning datasets for best performance.
How to Get Started with the Model
Use the following code to load and use the model:
from unsloth import FastLanguageModel
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Azzedde/llama3.1-8b-reasoning-grpo")
model = FastLanguageModel.from_pretrained("Azzedde/llama3.1-8b-reasoning-grpo")
# Example inference
reasoning_prompt = """Solve the following logical problem:
If all cats are mammals, and some mammals are not pets, does it follow that some cats are not pets? Explain your reasoning.
"""
inputs = tokenizer(reasoning_prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True)
print(tokenizer.decode(outputs[0]))
Training Details
Training Data: The model was fine-tuned on a custom reasoning dataset (2024v1).
Training Procedure:
- Preprocessing: Tokenized using structured logic templates.
- Training Hyperparameters:
batch_size=4
gradient_accumulation_steps=8
num_train_epochs=3
learning_rate=2e-4
fp16=True
Evaluation
Testing Data
- Used structured reasoning datasets from various logic-based tasks.
Factors
- Model performance was measured on logical consistency and deductive accuracy.
Metrics
- Logical Entailment Accuracy (LEA)
- Stepwise Deduction Success Rate (SDSR)
Results
- High accuracy in single-hop reasoning tasks.
- Struggles with highly ambiguous logical chains.
Environmental Impact
Hardware Type: Tesla T4 (Google Colab)
Hours Used: ~212 minutes
Cloud Provider: Google Colab
Compute Region: N/A
Technical Specifications
Model Architecture and Objective
- Based on Llama-3.1 8B with LoRA fine-tuning and vLLM fast inference.
Compute Infrastructure
- Fine-tuned using Unsloth for efficient training and inference.
Hardware
- GPU: Tesla T4
- Max Reserved Memory: ~8 GB
Software
- Libraries Used:
unsloth
,transformers
,TRL
,datasets
Citation [optional]
BibTeX:
@article{llama3.1-8b-grpo,
author = {Azzedde},
title = {Llama3.1-8B-GRPO: A Logical Reasoning LLM},
year = {2025},
url = {https://huggingface.co/Azzedde/llama3.1-8b-grpo}
}
APA:
Azzedde. (2025). Llama3.1-8B-GRPO: A Logical Reasoning LLM. Retrieved from Hugging Face
More Information
For questions, reach out via Hugging Face discussions or GitHub issues.
Model Card Authors
- Azzedde (GitHub: Azzedde)
Model Card Contact
Contact: Hugging Face Profile
- Downloads last month
- 17