Arjun-G-Ravi commited on
Commit
3220769
·
verified ·
1 Parent(s): c6dd1b0

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +24 -0
  2. config.json +14 -0
  3. pytorch_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Custom GPT Model
3
+
4
+ This is a custom GPT model with the following modifications from standard GPT-2:
5
+ - RMS normalization instead of LayerNorm
6
+ - Rotary positional embeddings (RoPE)
7
+ - Separate Q,K,V projections
8
+ - Squared ReLU activation in MLP
9
+ - QK normalization in attention
10
+ - Zero initialization for projection layers
11
+
12
+ ## Model Architecture
13
+ - Vocabulary Size: 50304
14
+ - Context Length: 1024
15
+ - Number of Layers: 12
16
+ - Number of Heads: 6
17
+ - Embedding Dimension: 768
18
+
19
+ ## Usage
20
+ ```python
21
+ from transformers import AutoModel
22
+ model = AutoModel.from_pretrained("Arjun-G-Ravi/Custom-GPT-555k")
23
+ ```
24
+
config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_attn_implementation_autoset": true,
3
+ "architectures": [
4
+ "CustomGPTPreTrainedModel"
5
+ ],
6
+ "block_size": 1024,
7
+ "model_type": "custom_gpt",
8
+ "n_embd": 768,
9
+ "n_head": 6,
10
+ "n_layer": 12,
11
+ "tokenizer_class": "GPT2Tokenizer",
12
+ "transformers_version": "4.48.1",
13
+ "vocab_size": 50304
14
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9b9241bfa5721a46c8186e18b74637299de0857ed13679a524e85dac34e08d0
3
+ size 494301897