library_name: transformers
tags:
- Llama
license: apache-2.0
language:
- sq
base_model:
- deepseek-ai/DeepSeek-R1
Model Card for Llama 8B Distilled from DeepSeek-R1
Model Details
Model Description
This is a LLaMA 8B model distilled from the DeepSeek-R1 architecture, fine-tuned for improved performance and efficiency. It is designed to handle a variety of tasks and is optimized for high-quality output generation. This version is particularly well-suited for natural language processing tasks, including text generation, completion, and classification.
- Developed by: Klei Aliaj
- Funded by: Dialogo
- Shared by: Klei Aliaj
- Language(s) (NLP): Albanian (sq)
- License: Apache-2.0
- Fine-tuned from model: deepseek-ai/DeepSeek-R1
Model Sources
- Repository: Klei1/bleta-deepseek-r1
- Paper: DeepSeek-R1 Overview (Example, replace with actual link if available)
- Demo: Llama 8B Demo (Example, replace with actual demo link)
Uses
Direct Use
This model is suitable for use directly in various NLP tasks such as language generation, completion, and question answering, with a focus on Albanian language tasks. It is a great fit for research, conversational AI, and content generation.
Downstream Use
When fine-tuned on specific tasks or integrated into larger systems, this model can handle domain-specific needs in applications such as customer support automation, content generation, and more.
Bias, Risks, and Limitations
The model has been trained on a large dataset and is designed to generate outputs across a wide variety of contexts. However, it may still reflect biases inherent in the training data, including cultural or language-specific biases.
- Recommendations: It is important for users to evaluate the outputs in the context of their specific use case and apply necessary filters or oversight when using the model in real-world applications.
How to Get Started with the Model
To get started, you can install the necessary libraries and load the model as shown below:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "klei1/bleta-deepseek-r1"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Example inference
input_text = "What are the top 10 lakes in Albania?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
Training Data
The model was fine-tuned on a variety of text data sources, including the Alpaca dataset. Specific preprocessing steps were applied to ensure the dataset was aligned with the model’s intended usage, focusing on language generation tasks in Albanian.
Training Procedure
The training was carried out using HuggingFace’s Trainer API with the following parameters:
- Training regime: Mixed precision, using FP16 for faster computations.
- Batch size: 2, with gradient accumulation for larger effective batch sizes.
- Learning rate: 2e-4, with warmup steps to stabilize training.
Speeds, Sizes, Times
The model was trained over 60 steps, with peak memory usage of 7.58 GB on a Tesla T4 GPU. Training time was approximately 13.75 minutes.
Evaluation
The model's performance was evaluated on a variety of natural language generation tasks in Albanian. Specific evaluation metrics included accuracy and relevance of generated responses to given prompts.
Environmental Impact
- Hardware Type: Tesla T4
- Hours used: 13.75 minutes of training
- Cloud Provider: Google Colab (Free Tier)
- Compute Region: US
- Carbon Emitted: Estimated at 0.01 kg CO2eq for the training session.
Technical Specifications
Model Architecture and Objective
The model is based on the LLaMA architecture, designed for efficient large-scale language modeling, with optimizations for memory and compute usage. It was fine-tuned from the DeepSeek-R1 model, focusing on generating high-quality responses.
Compute Infrastructure
The model was trained on a Tesla T4 GPU with mixed-precision training (FP16) to optimize performance.
Citation
BibTeX:
@misc{klei2025bleta,
author = {Klei Aliaj},
title = {Bleta: LLaMA 8B Distilled from DeepSeek-R1},
year = {2025},
url = {https://huggingface.co/klei1/bleta-deepseek-r1}
}
APA:
Aliaj, K. (2025). Bleta: LLaMA 8B Distilled from DeepSeek-R1. Retrieved from https://huggingface.co/klei1/bleta-deepseek-r1
- Downloads last month
- 14