OpenLLM Small Extended 7K Model
π Model Overview
This is the OpenLLM Small Extended 7K model, a 35.8M parameter GPT-style language model trained for 7,000 steps on Wikipedia passages from the SQuAD dataset. This model represents the latest iteration of our small model architecture with extended training.
π Model Specifications
- Architecture: GPT-style Transformer
- Parameters: 35,823,616 (35.8M)
- Layers: 6 transformer layers
- Heads: 8 attention heads
- Embedding Dimension: 512
- Vocabulary Size: 32,000 tokens
- Context Length: 1,024 tokens
- Training Steps: 7,000
- Model Size: Small
π― Training Details
- Dataset: Wikipedia passages from SQuAD dataset (~41k passages)
- Tokenization: SentencePiece with 32k vocabulary
- Training Objective: Next token prediction (causal language modeling)
- Optimizer: AdamW with learning rate scheduling
- Hardware: Trained on consumer GPU with gradient accumulation
π Model Files
huggingface/
βββ config.json # Model configuration
βββ generation_config.json # Generation parameters
βββ pytorch_model.bin # Model weights (161MB)
βββ tokenizer_config.json # Tokenizer configuration
βββ tokenizer.model # SentencePiece tokenizer
βββ load_hf_model.py # Loading script
π Usage
Loading with Hugging Face Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "path/to/huggingface"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Generate text
prompt = "The history of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs.input_ids,
max_new_tokens=100,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.pad_token_id
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
Using the Custom Loader
from load_hf_model import load_openllm_model
# Load the model using our custom loader
model, tokenizer = load_openllm_model("path/to/huggingface")
# Generate text
prompt = "Explain quantum computing in simple terms"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
inputs.input_ids,
max_new_tokens=150,
temperature=0.8,
top_p=0.9
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Inference Server
# Start the FastAPI inference server
python core/src/inference_server.py \
--model_path exports/huggingface-7k/huggingface \
--port 8000
# Make API calls
curl -X POST "http://localhost:8000/generate" \
-H "Content-Type: application/json" \
-d '{
"prompt": "The future of renewable energy",
"max_tokens": 100,
"temperature": 0.7
}'
π Performance
Training Metrics
- Final Loss: ~2.1 (cross-entropy)
- Training Time: ~7 hours on consumer GPU
- Memory Usage: ~2GB VRAM during training
- Inference Speed: ~50 tokens/second on CPU, ~200 tokens/second on GPU
Model Capabilities
- Text Generation: Coherent paragraph generation
- Question Answering: Basic factual responses
- Summarization: Short text summarization
- Language Understanding: Context-aware responses
π§ Configuration
Generation Parameters
{
"max_length": 512,
"max_new_tokens": 256,
"temperature": 0.7,
"top_k": 40,
"top_p": 0.9,
"do_sample": true,
"pad_token_id": 0,
"eos_token_id": 1,
"bos_token_id": 2
}
Model Architecture
{
"vocab_size": 32000,
"n_layer": 6,
"n_head": 8,
"n_embd": 512,
"block_size": 1024,
"dropout": 0.1,
"bias": true
}
π§ͺ Testing
Quick Test
# Test the model with a simple prompt
test_prompt = "Hello, how are you today?"
inputs = tokenizer(test_prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs.input_ids,
max_new_tokens=20,
temperature=0.7
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Input: {test_prompt}")
print(f"Output: {response}")
π Limitations
- Context Length: Limited to 1,024 tokens
- Training Data: Only Wikipedia passages (limited domain)
- Model Size: Small model with limited reasoning capabilities
- Bias: May inherit biases from training data
- Factual Accuracy: Not guaranteed for current events
π Model Comparison
Model | Parameters | Training Steps | Context Length | Use Case |
---|---|---|---|---|
Small 4K | 35.8M | 4,000 | 1,024 | Basic text generation |
Small 6K | 35.8M | 6,000 | 1,024 | Improved coherence |
Small 7K | 35.8M | 7,000 | 1,024 | Extended training |
π License
This model is dual-licensed:
- Open Source: GNU General Public License v3.0
- Commercial: Commercial License (contact for details)
See LICENSE
and docs/LICENSES.md
for full license information.
π€ Contributing
We welcome contributions to improve the model! Please see:
docs/CONTRIBUTING.md
for contribution guidelinesdocs/CODE_OF_CONDUCT.md
for community standards
π Support
For questions, issues, or commercial licensing:
- GitHub Issues: Report bugs and feature requests
- Documentation: Check
docs/
directory - Commercial License: Contact for enterprise use
Author: Louis Chua Bean Chong
Project: OpenLLM - Open Source Large Language Model
Version: 0.1.0
Last Updated: 2024
- Downloads last month
- 12
Dataset used to train lemms/openllm-small-extended-7k
Spaces using lemms/openllm-small-extended-7k 2
Evaluation results
- loss on Wikipedia passages from SQuADself-reported2.100
- perplexity on Wikipedia passages from SQuADself-reported8.200