File size: 3,654 Bytes
c2a7297
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d66fb15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---

language:
- en
license:
- gpl-3.0
- other
tags:
- text-generation
- language-model
- gpt
- transformer
- open-source
- squad
- wikipedia
datasets:
- squad
metrics:
- perplexity
- text-generation-quality
library_name: transformers
pipeline_tag: text-generation
model-index:
- name: OpenLLM Small Extended 6k
  results:
  - task:
      type: text-generation
    dataset:
      type: squad
      name: SQUAD Wikipedia Passages
    metrics:
      - type: perplexity
        value: 816.04
      - type: training_loss
        value: 5.4302
---


# OpenLLM Small Extended 6k

This is the OpenLLM Small Extended model trained for 6,000 steps on Wikipedia passages from the SQUAD dataset.

## Model Details

- **Model Type:** GPT-style Transformer
- **Architecture:** Small (35.8M parameters)
- **Training Steps:** 6,000
- **Training Data:** ~41k Wikipedia passages from SQUAD dataset
- **Tokenizer:** SentencePiece BPE (32k vocabulary)
- **License:** GPL-3.0 (Open Source) / Commercial License available

## Model Performance

- **Final Training Loss:** 5.4302
- **Model Parameters:** 35,823,616
- **Context Length:** 512 tokens
- **Training Hardware:** CPU/GPU compatible

## Usage

### Using Transformers

```python

from transformers import AutoTokenizer, AutoModelForCausalLM

import torch



# Load model and tokenizer

model_name = "lemms/openllm-small-extended-6k"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForCausalLM.from_pretrained(model_name)



# Generate text

prompt = "The history of artificial intelligence"

inputs = tokenizer(prompt, return_tensors="pt")



with torch.no_grad():

    outputs = model.generate(

        inputs.input_ids,

        max_new_tokens=50,

        temperature=0.7,

        top_k=40,

        do_sample=True

    )



generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generated_text)

```

### Using the Custom Loader

```python

# Use the provided load_hf_model.py script

from load_hf_model import load_model_and_tokenizer



model, tokenizer = load_model_and_tokenizer()

# ... rest of usage

```

## Training Details

This model was trained using the OpenLLM training pipeline:

1. **Data Preparation:** SQUAD dataset processing (~41k passages)
2. **Tokenizer Training:** SentencePiece BPE with 32k vocabulary
3. **Model Training:** GPT-style transformer for 6,000 steps
4. **Evaluation:** Perplexity and text generation quality assessment

## Model Architecture

- **Layers:** 12 transformer layers
- **Attention Heads:** 12
- **Hidden Size:** 768
- **Intermediate Size:** 3072
- **Activation:** GELU
- **Layer Norm:** Pre-norm

## Limitations

- **Training Data:** Limited to Wikipedia passages
- **Context Length:** 512 tokens maximum
- **Model Size:** Small model with 35.8M parameters
- **Performance:** Basic text generation capabilities

## License

This model is dual-licensed:
- **Open Source:** GPL-3.0 for research and community use
- **Commercial:** Commercial license available for enterprise use

For commercial licensing, contact: [email protected]

## Citation

If you use this model in your research, please cite:

```bibtex

@misc{openllm2024,

  title={OpenLLM: Open Source Large Language Model},

  author={Louis Chua Bean Chong},

  year={2024},

  url={https://github.com/louischua/openllm}

}

```

## Links

- **Repository:** https://github.com/louischua/openllm
- **Documentation:** https://github.com/louischua/openllm/docs
- **Training Pipeline:** https://github.com/louischua/openllm/docs/training_pipeline.md