lemms's picture
Add OpenLLM Small Extended 6k model
c2a7297 verified
---
language:
- en
license:
- gpl-3.0
- other
tags:
- text-generation
- language-model
- gpt
- transformer
- open-source
- squad
- wikipedia
datasets:
- squad
metrics:
- perplexity
- text-generation-quality
library_name: transformers
pipeline_tag: text-generation
model-index:
- name: OpenLLM Small Extended 6k
results:
- task:
type: text-generation
dataset:
type: squad
name: SQUAD Wikipedia Passages
metrics:
- type: perplexity
value: 816.04
- type: training_loss
value: 5.4302
---
# OpenLLM Small Extended 6k
This is the OpenLLM Small Extended model trained for 6,000 steps on Wikipedia passages from the SQUAD dataset.
## Model Details
- **Model Type:** GPT-style Transformer
- **Architecture:** Small (35.8M parameters)
- **Training Steps:** 6,000
- **Training Data:** ~41k Wikipedia passages from SQUAD dataset
- **Tokenizer:** SentencePiece BPE (32k vocabulary)
- **License:** GPL-3.0 (Open Source) / Commercial License available
## Model Performance
- **Final Training Loss:** 5.4302
- **Model Parameters:** 35,823,616
- **Context Length:** 512 tokens
- **Training Hardware:** CPU/GPU compatible
## Usage
### Using Transformers
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "lemms/openllm-small-extended-6k"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Generate text
prompt = "The history of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs.input_ids,
max_new_tokens=50,
temperature=0.7,
top_k=40,
do_sample=True
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```
### Using the Custom Loader
```python
# Use the provided load_hf_model.py script
from load_hf_model import load_model_and_tokenizer
model, tokenizer = load_model_and_tokenizer()
# ... rest of usage
```
## Training Details
This model was trained using the OpenLLM training pipeline:
1. **Data Preparation:** SQUAD dataset processing (~41k passages)
2. **Tokenizer Training:** SentencePiece BPE with 32k vocabulary
3. **Model Training:** GPT-style transformer for 6,000 steps
4. **Evaluation:** Perplexity and text generation quality assessment
## Model Architecture
- **Layers:** 12 transformer layers
- **Attention Heads:** 12
- **Hidden Size:** 768
- **Intermediate Size:** 3072
- **Activation:** GELU
- **Layer Norm:** Pre-norm
## Limitations
- **Training Data:** Limited to Wikipedia passages
- **Context Length:** 512 tokens maximum
- **Model Size:** Small model with 35.8M parameters
- **Performance:** Basic text generation capabilities
## License
This model is dual-licensed:
- **Open Source:** GPL-3.0 for research and community use
- **Commercial:** Commercial license available for enterprise use
For commercial licensing, contact: [email protected]
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{openllm2024,
title={OpenLLM: Open Source Large Language Model},
author={Louis Chua Bean Chong},
year={2024},
url={https://github.com/louischua/openllm}
}
```
## Links
- **Repository:** https://github.com/louischua/openllm
- **Documentation:** https://github.com/louischua/openllm/docs
- **Training Pipeline:** https://github.com/louischua/openllm/docs/training_pipeline.md