Custom-GPT-555k / README.md
Arjun-G-Ravi's picture
Update README.md
234f92d verified

NOT IN WORKING STATE, PLS WAIT

Custom GPT Model

This is a custom GPT model with:

  • RMS normalization
  • Rotary positional embeddings (RoPE)
  • Separate Q,K,V projections
  • Squared ReLU activation in MLP
  • QK normalization in attention
  • Zero initialization for projection layers

Architecture

  • Vocabulary Size: 50304
  • Context Length: 1024
  • Number of Layers: 12
  • Number of Heads: 6
  • Embedding Dimension: 768

Usage

from transformers import AutoModel
model = AutoModel.from_pretrained("Arjun-G-Ravi/Custom-GPT-555k")