library_name: transformers
tags:
- Llama
license: apache-2.0
language:
- sq
base_model:
- deepseek-ai/DeepSeek-R1

Model Card for Llama 8B Distilled from DeepSeek-R1

Model Details

Model Description

This is a LLaMA 8B model distilled from the DeepSeek-R1 architecture, fine-tuned for improved performance and efficiency. It is designed to handle a variety of tasks and is optimized for high-quality output generation. This version is particularly well-suited for natural language processing tasks, including text generation, completion, and classification.

  • Developed by: Klei Aliaj
  • Funded by: Dialogo
  • Shared by: Klei Aliaj
  • Language(s) (NLP): Albanian (sq)
  • License: Apache-2.0
  • Fine-tuned from model: deepseek-ai/DeepSeek-R1

Model Sources

Uses

Direct Use

This model is suitable for use directly in various NLP tasks such as language generation, completion, and question answering, with a focus on Albanian language tasks. It is a great fit for research, conversational AI, and content generation.

Downstream Use

When fine-tuned on specific tasks or integrated into larger systems, this model can handle domain-specific needs in applications such as customer support automation, content generation, and more.

Bias, Risks, and Limitations

The model has been trained on a large dataset and is designed to generate outputs across a wide variety of contexts. However, it may still reflect biases inherent in the training data, including cultural or language-specific biases.

  • Recommendations: It is important for users to evaluate the outputs in the context of their specific use case and apply necessary filters or oversight when using the model in real-world applications.

How to Get Started with the Model

To get started, you can install the necessary libraries and load the model as shown below:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "klei1/bleta-deepseek-r1"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Example inference
input_text = "What are the top 10 lakes in Albania?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

The model was fine-tuned on a variety of text data sources, including the Alpaca dataset. Specific preprocessing steps were applied to ensure the dataset was aligned with the model’s intended usage, focusing on language generation tasks in Albanian.

Training Procedure

The training was carried out using HuggingFace’s Trainer API with the following parameters:

  • Training regime: Mixed precision, using FP16 for faster computations.
  • Batch size: 2, with gradient accumulation for larger effective batch sizes.
  • Learning rate: 2e-4, with warmup steps to stabilize training.

Speeds, Sizes, Times

The model was trained over 60 steps, with peak memory usage of 7.58 GB on a Tesla T4 GPU. Training time was approximately 13.75 minutes.

Evaluation

The model's performance was evaluated on a variety of natural language generation tasks in Albanian. Specific evaluation metrics included accuracy and relevance of generated responses to given prompts.

Environmental Impact

  • Hardware Type: Tesla T4
  • Hours used: 13.75 minutes of training
  • Cloud Provider: Google Colab (Free Tier)
  • Compute Region: US
  • Carbon Emitted: Estimated at 0.01 kg CO2eq for the training session.

Technical Specifications

Model Architecture and Objective

The model is based on the LLaMA architecture, designed for efficient large-scale language modeling, with optimizations for memory and compute usage. It was fine-tuned from the DeepSeek-R1 model, focusing on generating high-quality responses.

Compute Infrastructure

The model was trained on a Tesla T4 GPU with mixed-precision training (FP16) to optimize performance.

Citation

BibTeX:

@misc{klei2025bleta,
  author = {Klei Aliaj},
  title = {Bleta: LLaMA 8B Distilled from DeepSeek-R1},
  year = {2025},
  url = {https://huggingface.co/klei1/bleta-deepseek-r1}
}

APA:

Aliaj, K. (2025). Bleta: LLaMA 8B Distilled from DeepSeek-R1. Retrieved from https://huggingface.co/klei1/bleta-deepseek-r1


Downloads last month
14
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.