OpenLLM Small Extended 7K Model

🌟 Model Overview

This is the OpenLLM Small Extended 7K model, a 35.8M parameter GPT-style language model trained for 7,000 steps on Wikipedia passages from the SQuAD dataset. This model represents the latest iteration of our small model architecture with extended training.

πŸ“Š Model Specifications

  • Architecture: GPT-style Transformer
  • Parameters: 35,823,616 (35.8M)
  • Layers: 6 transformer layers
  • Heads: 8 attention heads
  • Embedding Dimension: 512
  • Vocabulary Size: 32,000 tokens
  • Context Length: 1,024 tokens
  • Training Steps: 7,000
  • Model Size: Small

🎯 Training Details

  • Dataset: Wikipedia passages from SQuAD dataset (~41k passages)
  • Tokenization: SentencePiece with 32k vocabulary
  • Training Objective: Next token prediction (causal language modeling)
  • Optimizer: AdamW with learning rate scheduling
  • Hardware: Trained on consumer GPU with gradient accumulation

πŸ“ Model Files

huggingface/
β”œβ”€β”€ config.json              # Model configuration
β”œβ”€β”€ generation_config.json   # Generation parameters
β”œβ”€β”€ pytorch_model.bin        # Model weights (161MB)
β”œβ”€β”€ tokenizer_config.json    # Tokenizer configuration
β”œβ”€β”€ tokenizer.model          # SentencePiece tokenizer
└── load_hf_model.py         # Loading script

πŸš€ Usage

Loading with Hugging Face Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "path/to/huggingface"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate text
prompt = "The history of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs.input_ids,
        max_new_tokens=100,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id
    )

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Using the Custom Loader

from load_hf_model import load_openllm_model

# Load the model using our custom loader
model, tokenizer = load_openllm_model("path/to/huggingface")

# Generate text
prompt = "Explain quantum computing in simple terms"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    inputs.input_ids,
    max_new_tokens=150,
    temperature=0.8,
    top_p=0.9
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Inference Server

# Start the FastAPI inference server
python core/src/inference_server.py \
    --model_path exports/huggingface-7k/huggingface \
    --port 8000

# Make API calls
curl -X POST "http://localhost:8000/generate" \
    -H "Content-Type: application/json" \
    -d '{
        "prompt": "The future of renewable energy",
        "max_tokens": 100,
        "temperature": 0.7
    }'

πŸ“ˆ Performance

Training Metrics

  • Final Loss: ~2.1 (cross-entropy)
  • Training Time: ~7 hours on consumer GPU
  • Memory Usage: ~2GB VRAM during training
  • Inference Speed: ~50 tokens/second on CPU, ~200 tokens/second on GPU

Model Capabilities

  • Text Generation: Coherent paragraph generation
  • Question Answering: Basic factual responses
  • Summarization: Short text summarization
  • Language Understanding: Context-aware responses

πŸ”§ Configuration

Generation Parameters

{
  "max_length": 512,
  "max_new_tokens": 256,
  "temperature": 0.7,
  "top_k": 40,
  "top_p": 0.9,
  "do_sample": true,
  "pad_token_id": 0,
  "eos_token_id": 1,
  "bos_token_id": 2
}

Model Architecture

{
  "vocab_size": 32000,
  "n_layer": 6,
  "n_head": 8,
  "n_embd": 512,
  "block_size": 1024,
  "dropout": 0.1,
  "bias": true
}

πŸ§ͺ Testing

Quick Test

# Test the model with a simple prompt
test_prompt = "Hello, how are you today?"
inputs = tokenizer(test_prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs.input_ids,
        max_new_tokens=20,
        temperature=0.7
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Input: {test_prompt}")
print(f"Output: {response}")

πŸ“‹ Limitations

  • Context Length: Limited to 1,024 tokens
  • Training Data: Only Wikipedia passages (limited domain)
  • Model Size: Small model with limited reasoning capabilities
  • Bias: May inherit biases from training data
  • Factual Accuracy: Not guaranteed for current events

πŸ”„ Model Comparison

Model Parameters Training Steps Context Length Use Case
Small 4K 35.8M 4,000 1,024 Basic text generation
Small 6K 35.8M 6,000 1,024 Improved coherence
Small 7K 35.8M 7,000 1,024 Extended training

πŸ“„ License

This model is dual-licensed:

  • Open Source: GNU General Public License v3.0
  • Commercial: Commercial License (contact for details)

See LICENSE and docs/LICENSES.md for full license information.

🀝 Contributing

We welcome contributions to improve the model! Please see:

  • docs/CONTRIBUTING.md for contribution guidelines
  • docs/CODE_OF_CONDUCT.md for community standards

πŸ“ž Support

For questions, issues, or commercial licensing:

  • GitHub Issues: Report bugs and feature requests
  • Documentation: Check docs/ directory
  • Commercial License: Contact for enterprise use

Author: Louis Chua Bean Chong
Project: OpenLLM - Open Source Large Language Model
Version: 0.1.0
Last Updated: 2024

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train lemms/openllm-small-extended-7k

Spaces using lemms/openllm-small-extended-7k 2

Evaluation results