File size: 548 Bytes
234f92d 3220769 c0b6a78 d69dfcc c0b6a78 d69dfcc c0b6a78 d69dfcc c0b6a78 3220769 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
`NOT IN WORKING STATE, PLS WAIT`
# Custom GPT Model
This is a custom GPT model with:
- RMS normalization
- Rotary positional embeddings (RoPE)
- Separate Q,K,V projections
- Squared ReLU activation in MLP
- QK normalization in attention
- Zero initialization for projection layers
## Architecture
- Vocabulary Size: 50304
- Context Length: 1024
- Number of Layers: 12
- Number of Heads: 6
- Embedding Dimension: 768
## Usage
```python
from transformers import AutoModel
model = AutoModel.from_pretrained("Arjun-G-Ravi/Custom-GPT-555k")
```
|