# Custom GPT Model This is a custom GPT model with the following modifications from standard GPT-2: - RMS normalization instead of LayerNorm - Rotary positional embeddings (RoPE) - Separate Q,K,V projections - Squared ReLU activation in MLP - QK normalization in attention - Zero initialization for projection layers ## Model Architecture - Vocabulary Size: 50304 - Context Length: 1024 - Number of Layers: 12 - Number of Heads: 6 - Embedding Dimension: 768 ## Usage ```python from transformers import AutoModel model = AutoModel.from_pretrained("Arjun-G-Ravi/Custom-GPT-555k") ```