Spaces:

Shilpaj
/

ShakespeareGPT

Sleeping

Shilpaj commited on Jan 16

Commit

37bc9d5

verified ·

1 Parent(s): f1387d1

Feat: Updated to 124M model

Files changed (3) hide show

README.md CHANGED Viewed

@@ -19,10 +19,10 @@ This section focuses on Embeddings and Pre-training.
 In this project, a GPT (decoder-only) model is trained on Shakespeare data. The model architecture follows the original GPT design with multi-head self-attention and feed-forward layers. Key specifications include:
-- 8 transformer layers
-- 8 attention heads
-- 384 embedding dimensions
-- 512 context window size
 - ~50k vocabulary size
 The model is trained using cross-entropy loss and AdamW optimizer with weight decay. Training is done on Shakespeare's works to learn the language patterns and writing style. The trained model can generate Shakespeare-style text given a prompt.
@@ -31,7 +31,7 @@ The model is trained using cross-entropy loss and AdamW optimizer with weight de
 ### Project Structure
-```
 .
 ├── assets              # Images for README
 ├── nano_gpt_model.pt   # Trained model
@@ -45,7 +45,7 @@ The model is trained using cross-entropy loss and AdamW optimizer with weight de
 ### Install Dependencies
-```
 pip install -r requirements.txt
 ```
@@ -53,7 +53,7 @@ pip install -r requirements.txt
 ### Run the Notebook
-```
 jupyter notebook S12Trained.ipynb
 ```
@@ -136,9 +136,4 @@ For the ground is nothing henceforth fell executioner come
-### Try it out
-App Link: https://huggingface.co/spaces/Shilpaj/ShakespeareGPT
-![App](./assets/app.gif)

 In this project, a GPT (decoder-only) model is trained on Shakespeare data. The model architecture follows the original GPT design with multi-head self-attention and feed-forward layers. Key specifications include:
+- 12 transformer layers
+- 12 attention heads
+- 768 embedding dimensions
+- 1024 context window size
 - ~50k vocabulary size
 The model is trained using cross-entropy loss and AdamW optimizer with weight decay. Training is done on Shakespeare's works to learn the language patterns and writing style. The trained model can generate Shakespeare-style text given a prompt.
 ### Project Structure
+```bash
 .
 ├── assets              # Images for README
 ├── nano_gpt_model.pt   # Trained model
 ### Install Dependencies
+```bash
 pip install -r requirements.txt
 ```
 ### Run the Notebook
+```bash
 jupyter notebook S12Trained.ipynb
 ```

app.py CHANGED Viewed

@@ -11,11 +11,11 @@ import spaces
 # Configuration class (same as in training)
 @dataclass
 class GPTConfig:
-    block_size: int = 512
-    vocab_size: int = 50304
-    n_layer: int = 8
-    n_head: int = 8
-    n_embd: int = 384
 # Model architecture classes (copied from training notebook)
 class CausalSelfAttention(nn.Module):
@@ -154,8 +154,8 @@ model, device = load_model()
 demo = gr.Interface(
     fn=generate_text,
     inputs=[
-        gr.Textbox(label="Enter your prompt", value="Once upon a time"),
-        gr.Slider(minimum=1, maximum=512, value=50, step=1, label="Number of tokens to generate"),
         gr.Slider(minimum=0.1, maximum=2.0, value=0.8, step=0.1, label="Temperature (higher = more random)")
     ],
     outputs=gr.Textbox(label="Generated Text"),

 # Configuration class (same as in training)
 @dataclass
 class GPTConfig:
+    block_size: int = 1024
+    vocab_size: int = 50257
+    n_layer: int = 12
+    n_head: int = 12
+    n_embd: int = 768
 # Model architecture classes (copied from training notebook)
 class CausalSelfAttention(nn.Module):
 demo = gr.Interface(
     fn=generate_text,
     inputs=[
+        gr.Textbox(label="Enter your prompt", value="Thou shalt"),
+        gr.Slider(minimum=1, maximum=1024, value=100, step=1, label="Number of tokens to generate"),
         gr.Slider(minimum=0.1, maximum=2.0, value=0.8, step=0.1, label="Temperature (higher = more random)")
     ],
     outputs=gr.Textbox(label="Generated Text"),

nano_gpt_model.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d516ab130b427870fde8535925bf2be0fa609cb0437d384a817b346ba5411944
-size 143269482

 version https://git-lfs.github.com/spec/v1
+oid sha256:c1799438dae530e76e501535d8c2431c7658ed6354d9dd537dcb6c3c1ac86ab8
+size 548148666