Shilpaj commited on
Commit
37bc9d5
Β·
verified Β·
1 Parent(s): f1387d1

Feat: Updated to 124M model

Browse files
Files changed (3) hide show
  1. README.md +7 -12
  2. app.py +7 -7
  3. nano_gpt_model.pt +2 -2
README.md CHANGED
@@ -19,10 +19,10 @@ This section focuses on Embeddings and Pre-training.
19
 
20
  In this project, a GPT (decoder-only) model is trained on Shakespeare data. The model architecture follows the original GPT design with multi-head self-attention and feed-forward layers. Key specifications include:
21
 
22
- - 8 transformer layers
23
- - 8 attention heads
24
- - 384 embedding dimensions
25
- - 512 context window size
26
  - ~50k vocabulary size
27
 
28
  The model is trained using cross-entropy loss and AdamW optimizer with weight decay. Training is done on Shakespeare's works to learn the language patterns and writing style. The trained model can generate Shakespeare-style text given a prompt.
@@ -31,7 +31,7 @@ The model is trained using cross-entropy loss and AdamW optimizer with weight de
31
 
32
  ### Project Structure
33
 
34
- ```
35
  .
36
  β”œβ”€β”€ assets # Images for README
37
  β”œβ”€β”€ nano_gpt_model.pt # Trained model
@@ -45,7 +45,7 @@ The model is trained using cross-entropy loss and AdamW optimizer with weight de
45
 
46
  ### Install Dependencies
47
 
48
- ```
49
  pip install -r requirements.txt
50
  ```
51
 
@@ -53,7 +53,7 @@ pip install -r requirements.txt
53
 
54
  ### Run the Notebook
55
 
56
- ```
57
  jupyter notebook S12Trained.ipynb
58
  ```
59
 
@@ -136,9 +136,4 @@ For the ground is nothing henceforth fell executioner come
136
 
137
 
138
 
139
- ### Try it out
140
-
141
- App Link: https://huggingface.co/spaces/Shilpaj/ShakespeareGPT
142
-
143
 
144
- ![App](./assets/app.gif)
 
19
 
20
  In this project, a GPT (decoder-only) model is trained on Shakespeare data. The model architecture follows the original GPT design with multi-head self-attention and feed-forward layers. Key specifications include:
21
 
22
+ - 12 transformer layers
23
+ - 12 attention heads
24
+ - 768 embedding dimensions
25
+ - 1024 context window size
26
  - ~50k vocabulary size
27
 
28
  The model is trained using cross-entropy loss and AdamW optimizer with weight decay. Training is done on Shakespeare's works to learn the language patterns and writing style. The trained model can generate Shakespeare-style text given a prompt.
 
31
 
32
  ### Project Structure
33
 
34
+ ```bash
35
  .
36
  β”œβ”€β”€ assets # Images for README
37
  β”œβ”€β”€ nano_gpt_model.pt # Trained model
 
45
 
46
  ### Install Dependencies
47
 
48
+ ```bash
49
  pip install -r requirements.txt
50
  ```
51
 
 
53
 
54
  ### Run the Notebook
55
 
56
+ ```bash
57
  jupyter notebook S12Trained.ipynb
58
  ```
59
 
 
136
 
137
 
138
 
 
 
 
 
139
 
 
app.py CHANGED
@@ -11,11 +11,11 @@ import spaces
11
  # Configuration class (same as in training)
12
  @dataclass
13
  class GPTConfig:
14
- block_size: int = 512
15
- vocab_size: int = 50304
16
- n_layer: int = 8
17
- n_head: int = 8
18
- n_embd: int = 384
19
 
20
  # Model architecture classes (copied from training notebook)
21
  class CausalSelfAttention(nn.Module):
@@ -154,8 +154,8 @@ model, device = load_model()
154
  demo = gr.Interface(
155
  fn=generate_text,
156
  inputs=[
157
- gr.Textbox(label="Enter your prompt", value="Once upon a time"),
158
- gr.Slider(minimum=1, maximum=512, value=50, step=1, label="Number of tokens to generate"),
159
  gr.Slider(minimum=0.1, maximum=2.0, value=0.8, step=0.1, label="Temperature (higher = more random)")
160
  ],
161
  outputs=gr.Textbox(label="Generated Text"),
 
11
  # Configuration class (same as in training)
12
  @dataclass
13
  class GPTConfig:
14
+ block_size: int = 1024
15
+ vocab_size: int = 50257
16
+ n_layer: int = 12
17
+ n_head: int = 12
18
+ n_embd: int = 768
19
 
20
  # Model architecture classes (copied from training notebook)
21
  class CausalSelfAttention(nn.Module):
 
154
  demo = gr.Interface(
155
  fn=generate_text,
156
  inputs=[
157
+ gr.Textbox(label="Enter your prompt", value="Thou shalt"),
158
+ gr.Slider(minimum=1, maximum=1024, value=100, step=1, label="Number of tokens to generate"),
159
  gr.Slider(minimum=0.1, maximum=2.0, value=0.8, step=0.1, label="Temperature (higher = more random)")
160
  ],
161
  outputs=gr.Textbox(label="Generated Text"),
nano_gpt_model.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d516ab130b427870fde8535925bf2be0fa609cb0437d384a817b346ba5411944
3
- size 143269482
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c1799438dae530e76e501535d8c2431c7658ed6354d9dd537dcb6c3c1ac86ab8
3
+ size 548148666