Arjun-G-Ravi
/

Custom-GPT-555k

Model card Files Files and versions Community

Custom-GPT-555k / README.md

Arjun-G-Ravi's picture

Update README.md

234f92d verified 20 days ago

|

history blame contribute delete

548 Bytes

NOT IN WORKING STATE, PLS WAIT

Custom GPT Model

This is a custom GPT model with:

RMS normalization
Rotary positional embeddings (RoPE)
Separate Q,K,V projections
Squared ReLU activation in MLP
QK normalization in attention
Zero initialization for projection layers

Architecture

Vocabulary Size: 50304
Context Length: 1024
Number of Layers: 12
Number of Heads: 6
Embedding Dimension: 768

Usage

from transformers import AutoModel
model = AutoModel.from_pretrained("Arjun-G-Ravi/Custom-GPT-555k")