Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.21.0
metadata
title: ShakespeareGPT
emoji: π
colorFrom: gray
colorTo: red
sdk: gradio
sdk_version: 5.12.0
app_file: app.py
pinned: false
license: mit
short_description: 'GPT model pre-training step on Shakespeare dataset '
ShakespeareGPT
This section focuses on Embeddings and Pre-training.
In this project, a GPT (decoder-only) model is trained on Shakespeare data. The model architecture follows the original GPT design with multi-head self-attention and feed-forward layers. Key specifications include:
- 12 transformer layers
- 12 attention heads
- 768 embedding dimensions
- 1024 context window size
- ~50k vocabulary size
The model is trained using cross-entropy loss and AdamW optimizer with weight decay. Training is done on Shakespeare's works to learn the language patterns and writing style. The trained model can generate Shakespeare-style text given a prompt.
Project Structure
.
βββ assets # Images for README
βββ nano_gpt_model.pt # Trained model
βββ S12Trained.ipynb # Notebook for training
βββ input.txt # Shakespeare data
βββ README.md # This file
βββ requirements.txt # Dependencies
Install Dependencies
pip install -r requirements.txt
Run the Notebook
jupyter notebook S12Trained.ipynb
Training Logs
Training logs for few steps are shown below:
GPU Memory: 0.68GB / 1.77GB
step 10,000 | loss: 0.5863 | lr: 6.00e-05 | dt: 684.74ms | tok/sec: 5981.86 | norm: 3.94
GPU Memory: 0.68GB / 1.77GB
step 10,100 | loss: 0.5372 | lr: 6.00e-05 | dt: 687.72ms | tok/sec: 5955.88 | norm: 3.74
GPU Memory: 0.67GB / 1.77GB
step 10,200 | loss: 0.6054 | lr: 6.00e-05 | dt: 685.72ms | tok/sec: 5973.31 | norm: 5.71
GPU Memory: 0.68GB / 1.77GB
step 10,300 | loss: 0.5850 | lr: 6.00e-05 | dt: 686.01ms | tok/sec: 5970.77 | norm: 4.36
GPU Memory: 0.68GB / 1.77GB
step 10,400 | loss: 0.3319 | lr: 6.00e-05 | dt: 684.77ms | tok/sec: 5981.53 | norm: 4.68
GPU Memory: 0.68GB / 1.77GB
step 10,500 | loss: 0.4140 | lr: 6.00e-05 | dt: 684.41ms | tok/sec: 5984.70 | norm: 3.21
GPU Memory: 0.68GB / 1.77GB
step 10,600 | loss: 0.4008 | lr: 6.00e-05 | dt: 683.34ms | tok/sec: 5994.10 | norm: 3.58
GPU Memory: 0.68GB / 1.77GB
step 10,700 | loss: 0.3951 | lr: 6.00e-05 | dt: 685.49ms | tok/sec: 5975.26 | norm: 3.81
GPU Memory: 0.68GB / 1.77GB
step 10,800 | loss: 0.3022 | lr: 6.00e-05 | dt: 687.40ms | tok/sec: 5958.64 | norm: 3.06
GPU Memory: 0.68GB / 1.77GB
step 10,900 | loss: 0.4287 | lr: 6.00e-05 | dt: 686.75ms | tok/sec: 5964.31 | norm: 3.60
GPU Memory: 0.68GB / 1.77GB
step 11,000 | loss: 0.2447 | lr: 6.00e-05 | dt: 687.35ms | tok/sec: 5959.12 | norm: 3.35
GPU Memory: 0.68GB / 1.77GB
step 11,100 | loss: 0.2773 | lr: 6.00e-05 | dt: 688.83ms | tok/sec: 5946.35 | norm: 2.71
GPU Memory: 0.67GB / 1.77GB
step 11,200 | loss: 0.2839 | lr: 6.00e-05 | dt: 687.56ms | tok/sec: 5957.31 | norm: 3.90
GPU Memory: 0.68GB / 1.77GB
step 11,300 | loss: 0.3481 | lr: 6.00e-05 | dt: 684.68ms | tok/sec: 5982.32 | norm: 3.68
GPU Memory: 0.78GB / 1.77GB
step 11,400 | loss: 0.1913 | lr: 6.00e-05 | dt: 685.73ms | tok/sec: 5973.18 | norm: 2.93
GPU Memory: 0.68GB / 1.77GB
step 11,500 | loss: 0.2605 | lr: 6.00e-05 | dt: 685.74ms | tok/sec: 5973.11 | norm: 2.96
GPU Memory: 0.68GB / 1.77GB
step 11,600 | loss: 0.2029 | lr: 6.00e-05 | dt: 689.04ms | tok/sec: 5944.49 | norm: 2.84
Reached target loss! Final loss: 0.0889 at step 11,663
Model saved to gpt_model.pt
Model Output
Once upon a time to not;
More slaughter'd, sweet Rivers, I receive my children, and title with pardon hither
That one stuff'd with a conquest; and teeth, of my? Why, in life thee,
Which now not joy of foe, thought o'n slaughter bed,
And, is mine own soul me, not so heavy in every day:
The tyrant from one curst my death lies;
For the ground is nothing henceforth fell executioner come