File size: 4,062 Bytes
362babc
0c94512
 
 
 
 
 
 
 
 
 
362babc
0c94512
 
5bdbe00
 
 
 
 
 
 
37bc9d5
 
 
 
5bdbe00
 
 
 
 
 
 
 
37bc9d5
5bdbe00
 
 
 
 
 
 
 
 
f1387d1
 
5bdbe00
 
37bc9d5
5bdbe00
 
 
f1387d1
 
5bdbe00
 
37bc9d5
5bdbe00
 
 
f1387d1
 
5bdbe00
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f1387d1
 
5bdbe00
 
 
 
 
 
 
 
 
 
 
 
f1387d1
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
title: ShakespeareGPT
emoji: 🐠
colorFrom: gray
colorTo: red
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit
short_description: 'GPT model pre-training step on Shakespeare dataset '
---

# ShakespeareGPT

This section focuses on Embeddings and Pre-training.

![LLM Training Steps](./assets/LLMfromScratch2.png)

In this project, a GPT (decoder-only) model is trained on Shakespeare data. The model architecture follows the original GPT design with multi-head self-attention and feed-forward layers. Key specifications include:

- 12 transformer layers
- 12 attention heads 
- 768 embedding dimensions
- 1024 context window size
- ~50k vocabulary size

The model is trained using cross-entropy loss and AdamW optimizer with weight decay. Training is done on Shakespeare's works to learn the language patterns and writing style. The trained model can generate Shakespeare-style text given a prompt.



### Project Structure

```bash
.
β”œβ”€β”€ assets              # Images for README
β”œβ”€β”€ nano_gpt_model.pt   # Trained model
β”œβ”€β”€ S12Trained.ipynb    # Notebook for training
β”œβ”€β”€ input.txt           # Shakespeare data
β”œβ”€β”€ README.md           # This file
└── requirements.txt    # Dependencies
```



### Install Dependencies

```bash
pip install -r requirements.txt
```



### Run the Notebook

```bash
jupyter notebook S12Trained.ipynb
```



### Training Logs

Training logs for few steps are shown below:

```bash
GPU Memory: 0.68GB / 1.77GB
step 10,000 | loss: 0.5863 | lr: 6.00e-05 | dt: 684.74ms | tok/sec: 5981.86 | norm: 3.94
    
GPU Memory: 0.68GB / 1.77GB
step 10,100 | loss: 0.5372 | lr: 6.00e-05 | dt: 687.72ms | tok/sec: 5955.88 | norm: 3.74
    
GPU Memory: 0.67GB / 1.77GB
step 10,200 | loss: 0.6054 | lr: 6.00e-05 | dt: 685.72ms | tok/sec: 5973.31 | norm: 5.71
    
GPU Memory: 0.68GB / 1.77GB
step 10,300 | loss: 0.5850 | lr: 6.00e-05 | dt: 686.01ms | tok/sec: 5970.77 | norm: 4.36
    
GPU Memory: 0.68GB / 1.77GB
step 10,400 | loss: 0.3319 | lr: 6.00e-05 | dt: 684.77ms | tok/sec: 5981.53 | norm: 4.68
    
GPU Memory: 0.68GB / 1.77GB
step 10,500 | loss: 0.4140 | lr: 6.00e-05 | dt: 684.41ms | tok/sec: 5984.70 | norm: 3.21
    
GPU Memory: 0.68GB / 1.77GB
step 10,600 | loss: 0.4008 | lr: 6.00e-05 | dt: 683.34ms | tok/sec: 5994.10 | norm: 3.58
    
GPU Memory: 0.68GB / 1.77GB
step 10,700 | loss: 0.3951 | lr: 6.00e-05 | dt: 685.49ms | tok/sec: 5975.26 | norm: 3.81
    
GPU Memory: 0.68GB / 1.77GB
step 10,800 | loss: 0.3022 | lr: 6.00e-05 | dt: 687.40ms | tok/sec: 5958.64 | norm: 3.06
    
GPU Memory: 0.68GB / 1.77GB
step 10,900 | loss: 0.4287 | lr: 6.00e-05 | dt: 686.75ms | tok/sec: 5964.31 | norm: 3.60
    
GPU Memory: 0.68GB / 1.77GB
step 11,000 | loss: 0.2447 | lr: 6.00e-05 | dt: 687.35ms | tok/sec: 5959.12 | norm: 3.35
    
GPU Memory: 0.68GB / 1.77GB
step 11,100 | loss: 0.2773 | lr: 6.00e-05 | dt: 688.83ms | tok/sec: 5946.35 | norm: 2.71
    
GPU Memory: 0.67GB / 1.77GB
step 11,200 | loss: 0.2839 | lr: 6.00e-05 | dt: 687.56ms | tok/sec: 5957.31 | norm: 3.90
    
GPU Memory: 0.68GB / 1.77GB
step 11,300 | loss: 0.3481 | lr: 6.00e-05 | dt: 684.68ms | tok/sec: 5982.32 | norm: 3.68
    
GPU Memory: 0.78GB / 1.77GB
step 11,400 | loss: 0.1913 | lr: 6.00e-05 | dt: 685.73ms | tok/sec: 5973.18 | norm: 2.93
    
GPU Memory: 0.68GB / 1.77GB
step 11,500 | loss: 0.2605 | lr: 6.00e-05 | dt: 685.74ms | tok/sec: 5973.11 | norm: 2.96
    
GPU Memory: 0.68GB / 1.77GB
step 11,600 | loss: 0.2029 | lr: 6.00e-05 | dt: 689.04ms | tok/sec: 5944.49 | norm: 2.84
    

Reached target loss! Final loss: 0.0889 at step 11,663
Model saved to gpt_model.pt
```



### Model Output

```bash
Once upon a time to not;
More slaughter'd, sweet Rivers, I receive my children, and title with pardon hither
That one stuff'd with a conquest; and teeth, of my? Why, in life thee,
Which now not joy of foe, thought o'n slaughter bed,
And, is mine own soul me, not so heavy in every day:
The tyrant from one curst my death lies;
For the ground is nothing henceforth fell executioner come
```