File size: 752 Bytes
a7b78d0 84b2e2f e97eb2d 84b2e2f |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
---
license: mit
---
# einygpt
Here's the models I've trained using the transformer I wrote in [einygpt](https://github.com/clankur/einygpt). For reference they are:
- [a multihead attention model](./model_weights_mha.pth) replicating the model discussed in the [TinyStories paper](https://arxiv.org/abs/2305.07759) using the GPT2Tokenizer
- [a multiquery attention model](model_weights_mqa.pth) using the GPT2Tokenizer
- [a grouped query attention model with the number of groups = 4](model_weights_gqa_tt.pth) and using its own [tokenizer](https://github.com/clankur/einygpt/blob/main/tiny_tokenizer.py)
For playing with these model, you can view how they are used [here](https://github.com/clankur/einygpt/blob/main/perplexity.ipynb) |