`NOT IN WORKING STATE, PLS WAIT` # Custom GPT Model This is a custom GPT model with: - RMS normalization - Rotary positional embeddings (RoPE) - Separate Q,K,V projections - Squared ReLU activation in MLP - QK normalization in attention - Zero initialization for projection layers ## Architecture - Vocabulary Size: 50304 - Context Length: 1024 - Number of Layers: 12 - Number of Heads: 6 - Embedding Dimension: 768 ## Usage ```python from transformers import AutoModel model = AutoModel.from_pretrained("Arjun-G-Ravi/Custom-GPT-555k") ```