--- license: mit --- # einygpt Here's the models I've trained using the transformer I wrote in [einygpt](https://github.com/clankur/einygpt). For reference they are: - [a multihead attention model](./model_weights_mha.pth) replicating the model discussed in the [TinyStories paper](https://arxiv.org/abs/2305.07759) using the GPT2Tokenizer - [a multiquery attention model](model_weights_mqa.pth) using the GPT2Tokenizer - [a grouped query attention model with the number of groups = 4](model_weights_gqa_tt.pth) and using its own [tokenizer](https://github.com/clankur/einygpt/blob/main/tiny_tokenizer.py) For playing with these model, you can view how they are used [here](https://github.com/clankur/einygpt/blob/main/perplexity.ipynb)