Spaces:
Sleeping
Sleeping
Feat: Updated to 124M model
Browse files- README.md +7 -12
- app.py +7 -7
- nano_gpt_model.pt +2 -2
README.md
CHANGED
@@ -19,10 +19,10 @@ This section focuses on Embeddings and Pre-training.
|
|
19 |
|
20 |
In this project, a GPT (decoder-only) model is trained on Shakespeare data. The model architecture follows the original GPT design with multi-head self-attention and feed-forward layers. Key specifications include:
|
21 |
|
22 |
-
-
|
23 |
-
-
|
24 |
-
-
|
25 |
-
-
|
26 |
- ~50k vocabulary size
|
27 |
|
28 |
The model is trained using cross-entropy loss and AdamW optimizer with weight decay. Training is done on Shakespeare's works to learn the language patterns and writing style. The trained model can generate Shakespeare-style text given a prompt.
|
@@ -31,7 +31,7 @@ The model is trained using cross-entropy loss and AdamW optimizer with weight de
|
|
31 |
|
32 |
### Project Structure
|
33 |
|
34 |
-
```
|
35 |
.
|
36 |
βββ assets # Images for README
|
37 |
βββ nano_gpt_model.pt # Trained model
|
@@ -45,7 +45,7 @@ The model is trained using cross-entropy loss and AdamW optimizer with weight de
|
|
45 |
|
46 |
### Install Dependencies
|
47 |
|
48 |
-
```
|
49 |
pip install -r requirements.txt
|
50 |
```
|
51 |
|
@@ -53,7 +53,7 @@ pip install -r requirements.txt
|
|
53 |
|
54 |
### Run the Notebook
|
55 |
|
56 |
-
```
|
57 |
jupyter notebook S12Trained.ipynb
|
58 |
```
|
59 |
|
@@ -136,9 +136,4 @@ For the ground is nothing henceforth fell executioner come
|
|
136 |
|
137 |
|
138 |
|
139 |
-
### Try it out
|
140 |
-
|
141 |
-
App Link: https://huggingface.co/spaces/Shilpaj/ShakespeareGPT
|
142 |
-
|
143 |
|
144 |
-

|
|
|
19 |
|
20 |
In this project, a GPT (decoder-only) model is trained on Shakespeare data. The model architecture follows the original GPT design with multi-head self-attention and feed-forward layers. Key specifications include:
|
21 |
|
22 |
+
- 12 transformer layers
|
23 |
+
- 12 attention heads
|
24 |
+
- 768 embedding dimensions
|
25 |
+
- 1024 context window size
|
26 |
- ~50k vocabulary size
|
27 |
|
28 |
The model is trained using cross-entropy loss and AdamW optimizer with weight decay. Training is done on Shakespeare's works to learn the language patterns and writing style. The trained model can generate Shakespeare-style text given a prompt.
|
|
|
31 |
|
32 |
### Project Structure
|
33 |
|
34 |
+
```bash
|
35 |
.
|
36 |
βββ assets # Images for README
|
37 |
βββ nano_gpt_model.pt # Trained model
|
|
|
45 |
|
46 |
### Install Dependencies
|
47 |
|
48 |
+
```bash
|
49 |
pip install -r requirements.txt
|
50 |
```
|
51 |
|
|
|
53 |
|
54 |
### Run the Notebook
|
55 |
|
56 |
+
```bash
|
57 |
jupyter notebook S12Trained.ipynb
|
58 |
```
|
59 |
|
|
|
136 |
|
137 |
|
138 |
|
|
|
|
|
|
|
|
|
139 |
|
|
app.py
CHANGED
@@ -11,11 +11,11 @@ import spaces
|
|
11 |
# Configuration class (same as in training)
|
12 |
@dataclass
|
13 |
class GPTConfig:
|
14 |
-
block_size: int =
|
15 |
-
vocab_size: int =
|
16 |
-
n_layer: int =
|
17 |
-
n_head: int =
|
18 |
-
n_embd: int =
|
19 |
|
20 |
# Model architecture classes (copied from training notebook)
|
21 |
class CausalSelfAttention(nn.Module):
|
@@ -154,8 +154,8 @@ model, device = load_model()
|
|
154 |
demo = gr.Interface(
|
155 |
fn=generate_text,
|
156 |
inputs=[
|
157 |
-
gr.Textbox(label="Enter your prompt", value="
|
158 |
-
gr.Slider(minimum=1, maximum=
|
159 |
gr.Slider(minimum=0.1, maximum=2.0, value=0.8, step=0.1, label="Temperature (higher = more random)")
|
160 |
],
|
161 |
outputs=gr.Textbox(label="Generated Text"),
|
|
|
11 |
# Configuration class (same as in training)
|
12 |
@dataclass
|
13 |
class GPTConfig:
|
14 |
+
block_size: int = 1024
|
15 |
+
vocab_size: int = 50257
|
16 |
+
n_layer: int = 12
|
17 |
+
n_head: int = 12
|
18 |
+
n_embd: int = 768
|
19 |
|
20 |
# Model architecture classes (copied from training notebook)
|
21 |
class CausalSelfAttention(nn.Module):
|
|
|
154 |
demo = gr.Interface(
|
155 |
fn=generate_text,
|
156 |
inputs=[
|
157 |
+
gr.Textbox(label="Enter your prompt", value="Thou shalt"),
|
158 |
+
gr.Slider(minimum=1, maximum=1024, value=100, step=1, label="Number of tokens to generate"),
|
159 |
gr.Slider(minimum=0.1, maximum=2.0, value=0.8, step=0.1, label="Temperature (higher = more random)")
|
160 |
],
|
161 |
outputs=gr.Textbox(label="Generated Text"),
|
nano_gpt_model.pt
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c1799438dae530e76e501535d8c2431c7658ed6354d9dd537dcb6c3c1ac86ab8
|
3 |
+
size 548148666
|