|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- mikasenghaas/wikitext-2 |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- bleu |
|
|
- rouge |
|
|
- perplexity |
|
|
- accuracy |
|
|
base_model: |
|
|
- openai-community/gpt2 |
|
|
tags: |
|
|
- Quantized |
|
|
- Pruned |
|
|
- Small |
|
|
- Nano |
|
|
- SBC |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# Model Card: Pruned & Quantized GPT-2 Fine-Tuned on WikiText-2 |
|
|
|
|
|
## Model Summary |
|
|
|
|
|
This model is a pruned and quantized version of the GPT-2 architecture, fine-tuned on the WikiText-2 dataset. The pruning and quantization techniques reduce the model's size and computational requirements, making it suitable for deployment in resource-constrained environments, such as edge devices or applications with limited computational power. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Developed by |
|
|
|
|
|
- **Developer:** [SynSci] |
|
|
- **Contact:** [[email protected]] |
|
|
|
|
|
### Model Description |
|
|
|
|
|
- **Architecture:** GPT-2 (Generative Pre-trained Transformer 2) |
|
|
- **Model Type:** Transformer-based language model |
|
|
- **Base Model:** [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) |
|
|
- **Language:** English |
|
|
- **License:** MIT |
|
|
- **Fine-tuned on:** [mikasenghaas/wikitext-2](https://huggingface.co/datasets/mikasenghaas/wikitext-2) |
|
|
- **Modifications:** |
|
|
- **Pruning:** Redundant weights removed to decrease model size and inference time. |
|
|
- **Quantization:** Weights quantized to 8-bit integers to reduce memory footprint and improve efficiency. |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
- Text generation |
|
|
- Language modeling |
|
|
- Autocomplete suggestions |
|
|
- Educational purposes in NLP and model optimization techniques |
|
|
|
|
|
### Downstream Use |
|
|
|
|
|
- Integration into applications requiring efficient language models |
|
|
- Deployment on devices with limited computational resources |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
- Generation of misleading or harmful content |
|
|
- Applications requiring understanding of languages other than English |
|
|
- Tasks demanding high-precision language understanding beyond the model's capabilities |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
### Biases |
|
|
|
|
|
The model inherits biases present in the GPT-2 architecture and the WikiText-2 dataset, which consists of Wikipedia articles. These biases may include underrepresentation of certain topics or perspectives. |
|
|
|
|
|
### Risks |
|
|
|
|
|
- Potential generation of biased or inappropriate content |
|
|
- Misinterpretation of generated text as factual information |
|
|
|
|
|
### Limitations |
|
|
|
|
|
- Reduced performance compared to the full-sized GPT-2 model due to pruning and quantization |
|
|
- Limited to English language understanding and generation |
|
|
- Not suitable for tasks requiring real-time processing of large-scale data |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
Users should: |
|
|
|
|
|
- Implement content filtering mechanisms to prevent the generation of inappropriate content. |
|
|
- Avoid using the model for critical applications without thorough evaluation. |
|
|
- Be aware of the model's limitations in understanding nuanced language and context. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("swayamsingal/NanoQuant") |
|
|
model = AutoModelForCausalLM.from_pretrained("swayamsingal/NanoQuant") |
|
|
|
|
|
input_text = "Once upon a time" |
|
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=50) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
- **Dataset:** [mikasenghaas/wikitext-2](https://huggingface.co/datasets/mikasenghaas/wikitext-2) |
|
|
- **Description:** A collection of over 100 million tokens extracted from verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License. |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
- **Preprocessing:** Standard tokenization and formatting compatible with GPT-2 requirements. |
|
|
- **Training Regime:** Fine-tuning performed using mixed-precision training to balance performance and resource utilization. |
|
|
- **Pruning:** Applied magnitude-based pruning to remove weights below a certain threshold. |
|
|
- **Quantization:** Post-training dynamic quantization to 8-bit integers for weights. |
|
|
|
|
|
### Hyperparameters |
|
|
|
|
|
- **Learning Rate:** 5e-5 |
|
|
- **Batch Size:** 32 |
|
|
- **Epochs:** 3 |
|
|
- **Optimizer:** AdamW |
|
|
- **Weight Decay:** 0.01 |
|
|
|
|
|
### Speeds, Sizes, Times |
|
|
|
|
|
- **Original Model Size:** ~500 MB |
|
|
- **Pruned & Quantized Model Size:** ~6 MB |
|
|
- **Training Time:** Approximately 2 hours on a single MPS chip |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Testing Data |
|
|
|
|
|
- **Dataset:** [mikasenghaas/wikitext-2](https://huggingface.co/datasets/mikasenghaas/wikitext-2) |
|
|
- **Split:** Validation set used for evaluation |
|
|
|
|
|
### Metrics |
|
|
|
|
|
- **Perplexity:** 155.43 |
|
|
- **BLEU Score:** 0.0498 |
|
|
- **ROUGE-1 Score:** 0.1836 |
|
|
- **Accuracy:** 93.2% |
|
|
|
|
|
### Results Summary |
|
|
|
|
|
The pruned and quantized model achieves competitive performance on the WikiText-2 validation set, with a significant reduction in model size and inference time compared to the original GPT-2 model. |
|
|
|
|
|
## Model Examination |
|
|
|
|
|
While specific interpretability analyses were not conducted, the model's architecture remains consistent with GPT-2, and standard transformer interpretability techniques can be applied. |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
- **Hardware Type:** Macbook MPS [🙂↕️can't afford a good cuda gpu] |
|
|
- **Training Duration:** 2 hours |
|
|
- **Energy Consumption:** Approximately 0.5 kWh |
|
|
- **Carbon Emitted:** Estimated 0.2 kg CO₂ |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
### Model Architecture and Objective |
|
|
|
|
|
- **Architecture:** Transformer decoder with 12 layers, 12 attention heads, and a hidden size of 768. |
|
|
- **Objective:** Causal language modeling (predicting the next token in a sequence). |
|
|
|
|
|
### Compute Infrastructure |
|
|
|
|
|
- **Hardware:** Single NVIDIA V100 GPU |
|
|
- **Software:** PyTorch, Transformers library by Hugging Face |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{NanoQuant, |
|
|
title={NanoQuant}, |
|
|
author={swayamsingal}, |
|
|
year={2025}, |
|
|
howpublished={\url{https://huggingface.co/swayamsingal/NanoQuant}}, |
|
|
} |
|
|
``` |
|
|
|
|
|
## Glossary |
|
|
|
|
|
- **Pruning:** The process of removing weights from a neural network to reduce its size and computational requirements. |
|
|
- **Quantization:** The process of reducing the precision of the weights in a neural network, typically to 8-bit integers, to decrease model size and increase inference speed. |