Spaces:
Sleeping
Sleeping
File size: 4,062 Bytes
362babc 0c94512 362babc 0c94512 5bdbe00 37bc9d5 5bdbe00 37bc9d5 5bdbe00 f1387d1 5bdbe00 37bc9d5 5bdbe00 f1387d1 5bdbe00 37bc9d5 5bdbe00 f1387d1 5bdbe00 f1387d1 5bdbe00 f1387d1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
---
title: ShakespeareGPT
emoji: π
colorFrom: gray
colorTo: red
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit
short_description: 'GPT model pre-training step on Shakespeare dataset '
---
# ShakespeareGPT
This section focuses on Embeddings and Pre-training.

In this project, a GPT (decoder-only) model is trained on Shakespeare data. The model architecture follows the original GPT design with multi-head self-attention and feed-forward layers. Key specifications include:
- 12 transformer layers
- 12 attention heads
- 768 embedding dimensions
- 1024 context window size
- ~50k vocabulary size
The model is trained using cross-entropy loss and AdamW optimizer with weight decay. Training is done on Shakespeare's works to learn the language patterns and writing style. The trained model can generate Shakespeare-style text given a prompt.
### Project Structure
```bash
.
βββ assets # Images for README
βββ nano_gpt_model.pt # Trained model
βββ S12Trained.ipynb # Notebook for training
βββ input.txt # Shakespeare data
βββ README.md # This file
βββ requirements.txt # Dependencies
```
### Install Dependencies
```bash
pip install -r requirements.txt
```
### Run the Notebook
```bash
jupyter notebook S12Trained.ipynb
```
### Training Logs
Training logs for few steps are shown below:
```bash
GPU Memory: 0.68GB / 1.77GB
step 10,000 | loss: 0.5863 | lr: 6.00e-05 | dt: 684.74ms | tok/sec: 5981.86 | norm: 3.94
GPU Memory: 0.68GB / 1.77GB
step 10,100 | loss: 0.5372 | lr: 6.00e-05 | dt: 687.72ms | tok/sec: 5955.88 | norm: 3.74
GPU Memory: 0.67GB / 1.77GB
step 10,200 | loss: 0.6054 | lr: 6.00e-05 | dt: 685.72ms | tok/sec: 5973.31 | norm: 5.71
GPU Memory: 0.68GB / 1.77GB
step 10,300 | loss: 0.5850 | lr: 6.00e-05 | dt: 686.01ms | tok/sec: 5970.77 | norm: 4.36
GPU Memory: 0.68GB / 1.77GB
step 10,400 | loss: 0.3319 | lr: 6.00e-05 | dt: 684.77ms | tok/sec: 5981.53 | norm: 4.68
GPU Memory: 0.68GB / 1.77GB
step 10,500 | loss: 0.4140 | lr: 6.00e-05 | dt: 684.41ms | tok/sec: 5984.70 | norm: 3.21
GPU Memory: 0.68GB / 1.77GB
step 10,600 | loss: 0.4008 | lr: 6.00e-05 | dt: 683.34ms | tok/sec: 5994.10 | norm: 3.58
GPU Memory: 0.68GB / 1.77GB
step 10,700 | loss: 0.3951 | lr: 6.00e-05 | dt: 685.49ms | tok/sec: 5975.26 | norm: 3.81
GPU Memory: 0.68GB / 1.77GB
step 10,800 | loss: 0.3022 | lr: 6.00e-05 | dt: 687.40ms | tok/sec: 5958.64 | norm: 3.06
GPU Memory: 0.68GB / 1.77GB
step 10,900 | loss: 0.4287 | lr: 6.00e-05 | dt: 686.75ms | tok/sec: 5964.31 | norm: 3.60
GPU Memory: 0.68GB / 1.77GB
step 11,000 | loss: 0.2447 | lr: 6.00e-05 | dt: 687.35ms | tok/sec: 5959.12 | norm: 3.35
GPU Memory: 0.68GB / 1.77GB
step 11,100 | loss: 0.2773 | lr: 6.00e-05 | dt: 688.83ms | tok/sec: 5946.35 | norm: 2.71
GPU Memory: 0.67GB / 1.77GB
step 11,200 | loss: 0.2839 | lr: 6.00e-05 | dt: 687.56ms | tok/sec: 5957.31 | norm: 3.90
GPU Memory: 0.68GB / 1.77GB
step 11,300 | loss: 0.3481 | lr: 6.00e-05 | dt: 684.68ms | tok/sec: 5982.32 | norm: 3.68
GPU Memory: 0.78GB / 1.77GB
step 11,400 | loss: 0.1913 | lr: 6.00e-05 | dt: 685.73ms | tok/sec: 5973.18 | norm: 2.93
GPU Memory: 0.68GB / 1.77GB
step 11,500 | loss: 0.2605 | lr: 6.00e-05 | dt: 685.74ms | tok/sec: 5973.11 | norm: 2.96
GPU Memory: 0.68GB / 1.77GB
step 11,600 | loss: 0.2029 | lr: 6.00e-05 | dt: 689.04ms | tok/sec: 5944.49 | norm: 2.84
Reached target loss! Final loss: 0.0889 at step 11,663
Model saved to gpt_model.pt
```
### Model Output
```bash
Once upon a time to not;
More slaughter'd, sweet Rivers, I receive my children, and title with pardon hither
That one stuff'd with a conquest; and teeth, of my? Why, in life thee,
Which now not joy of foe, thought o'n slaughter bed,
And, is mine own soul me, not so heavy in every day:
The tyrant from one curst my death lies;
For the ground is nothing henceforth fell executioner come
```
|