metadata
license: mit
einygpt
Here's the models I've trained using the transformer I wrote in einygpt. For reference they are:
- a multihead attention model replicating the model discussed in the TinyStories paper using the GPT2Tokenizer
- a multiquery attention model using the GPT2Tokenizer
- a grouped query attention model with the number of groups = 4 and using its own tokenizer
For playing with these model, you can view how they are used here