OpenLLM Small Extended 6k
This is the OpenLLM Small Extended model trained for 6,000 steps on Wikipedia passages from the SQUAD dataset.
Model Details
- Model Type: GPT-style Transformer
- Architecture: Small (35.8M parameters)
- Training Steps: 6,000
- Training Data: ~41k Wikipedia passages from SQUAD dataset
- Tokenizer: SentencePiece BPE (32k vocabulary)
- License: GPL-3.0 (Open Source) / Commercial License available
Model Performance
- Final Training Loss: 5.4302
- Model Parameters: 35,823,616
- Context Length: 512 tokens
- Training Hardware: CPU/GPU compatible
Usage
Using Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "lemms/openllm-small-extended-6k"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Generate text
prompt = "The history of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs.input_ids,
max_new_tokens=50,
temperature=0.7,
top_k=40,
do_sample=True
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
Using the Custom Loader
# Use the provided load_hf_model.py script
from load_hf_model import load_model_and_tokenizer
model, tokenizer = load_model_and_tokenizer()
# ... rest of usage
Training Details
This model was trained using the OpenLLM training pipeline:
- Data Preparation: SQUAD dataset processing (~41k passages)
- Tokenizer Training: SentencePiece BPE with 32k vocabulary
- Model Training: GPT-style transformer for 6,000 steps
- Evaluation: Perplexity and text generation quality assessment
Model Architecture
- Layers: 12 transformer layers
- Attention Heads: 12
- Hidden Size: 768
- Intermediate Size: 3072
- Activation: GELU
- Layer Norm: Pre-norm
Limitations
- Training Data: Limited to Wikipedia passages
- Context Length: 512 tokens maximum
- Model Size: Small model with 35.8M parameters
- Performance: Basic text generation capabilities
License
This model is dual-licensed:
- Open Source: GPL-3.0 for research and community use
- Commercial: Commercial license available for enterprise use
For commercial licensing, contact: [email protected]
Citation
If you use this model in your research, please cite:
@misc{openllm2024,
title={OpenLLM: Open Source Large Language Model},
author={Louis Chua Bean Chong},
year={2024},
url={https://github.com/louischua/openllm}
}
Links
- Repository: https://github.com/louischua/openllm
- Documentation: https://github.com/louischua/openllm/docs
- Training Pipeline: https://github.com/louischua/openllm/docs/training_pipeline.md
- Downloads last month
- 11
Dataset used to train lemms/openllm-small-extended-6k
Space using lemms/openllm-small-extended-6k 1
Evaluation results
- perplexity on SQUAD Wikipedia Passagesself-reported816.040
- training_loss on SQUAD Wikipedia Passagesself-reported5.430