OpenLLM Small Extended 7K Model

🌟 Model Overview

This is the OpenLLM Small Extended 7K model, a 35.8M parameter GPT-style language model trained for 7,000 steps on Wikipedia passages from the SQuAD dataset. This model represents the latest iteration of our small model architecture with extended training.

📊 Model Specifications

Architecture: GPT-style Transformer
Parameters: 35,823,616 (35.8M)
Layers: 6 transformer layers
Heads: 8 attention heads
Embedding Dimension: 512
Vocabulary Size: 32,000 tokens
Context Length: 1,024 tokens
Training Steps: 7,000
Model Size: Small

🎯 Training Details

Dataset: Wikipedia passages from SQuAD dataset (~41k passages)
Tokenization: SentencePiece with 32k vocabulary
Training Objective: Next token prediction (causal language modeling)
Optimizer: AdamW with learning rate scheduling
Hardware: Trained on consumer GPU with gradient accumulation

📁 Model Files

huggingface/
├── config.json              # Model configuration
├── generation_config.json   # Generation parameters
├── pytorch_model.bin        # Model weights (161MB)
├── tokenizer_config.json    # Tokenizer configuration
├── tokenizer.model          # SentencePiece tokenizer
└── load_hf_model.py         # Loading script

🚀 Usage

Loading with Hugging Face Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "path/to/huggingface"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate text
prompt = "The history of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs.input_ids,
        max_new_tokens=100,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id
    )

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Using the Custom Loader

from load_hf_model import load_openllm_model

# Load the model using our custom loader
model, tokenizer = load_openllm_model("path/to/huggingface")

# Generate text
prompt = "Explain quantum computing in simple terms"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=150,
    temperature=0.8,
    top_p=0.9
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Inference Server

# Start the FastAPI inference server
python core/src/inference_server.py \
    --model_path exports/huggingface-7k/huggingface \
    --port 8000

# Make API calls
curl -X POST "http://localhost:8000/generate" \
    -H "Content-Type: application/json" \
    -d '{
        "prompt": "The future of renewable energy",
        "max_tokens": 100,
        "temperature": 0.7
    }'

📈 Performance

Training Metrics

Final Loss: ~2.1 (cross-entropy)
Training Time: ~7 hours on consumer GPU
Memory Usage: ~2GB VRAM during training
Inference Speed: ~50 tokens/second on CPU, ~200 tokens/second on GPU

Model Capabilities

Text Generation: Coherent paragraph generation
Question Answering: Basic factual responses
Summarization: Short text summarization
Language Understanding: Context-aware responses

🔧 Configuration

Generation Parameters

{
  "max_length": 512,
  "max_new_tokens": 256,
  "temperature": 0.7,
  "top_k": 40,
  "top_p": 0.9,
  "do_sample": true,
  "pad_token_id": 0,
  "eos_token_id": 1,
  "bos_token_id": 2
}

Model Architecture

{
  "vocab_size": 32000,
  "n_layer": 6,
  "n_head": 8,
  "n_embd": 512,
  "block_size": 1024,
  "dropout": 0.1,
  "bias": true
}

🧪 Testing

Quick Test

# Test the model with a simple prompt
test_prompt = "Hello, how are you today?"
inputs = tokenizer(test_prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs.input_ids,
        max_new_tokens=20,
        temperature=0.7
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Input: {test_prompt}")
print(f"Output: {response}")

📋 Limitations

Context Length: Limited to 1,024 tokens
Training Data: Only Wikipedia passages (limited domain)
Model Size: Small model with limited reasoning capabilities
Bias: May inherit biases from training data
Factual Accuracy: Not guaranteed for current events

🔄 Model Comparison

Model	Parameters	Training Steps	Context Length	Use Case
Small 4K	35.8M	4,000	1,024	Basic text generation
Small 6K	35.8M	6,000	1,024	Improved coherence
Small 7K	35.8M	7,000	1,024	Extended training

📄 License

This model is dual-licensed:

Open Source: GNU General Public License v3.0
Commercial: Commercial License (contact for details)

See LICENSE and docs/LICENSES.md for full license information.

🤝 Contributing

We welcome contributions to improve the model! Please see:

docs/CONTRIBUTING.md for contribution guidelines
docs/CODE_OF_CONDUCT.md for community standards

📞 Support

For questions, issues, or commercial licensing:

GitHub Issues: Report bugs and feature requests
Documentation: Check docs/ directory
Commercial License: Contact for enterprise use

Author: Louis Chua Bean Chong
Project: OpenLLM - Open Source Large Language Model
Version: 0.1.0
Last Updated: 2024

Downloads last month: 3

Dataset used to train lemms/openllm-small-extended-7k

Spaces using lemms/openllm-small-extended-7k 2

Evaluation results

loss on Wikipedia passages from SQuAD
self-reported

2.100
perplexity on Wikipedia passages from SQuAD
self-reported

8.200

View on Papers With Code