Performance oddities of the 3M model.
#3 opened 9 months ago
by
MartialTerran
GPT-2 model having16 4-float attention heads
#2 opened 9 months ago
by
MartialTerran
Adding `safetensors` variant of this model
#1 opened about 1 year ago
by
SFconvertbot
![](https://cdn-avatars.huggingface.co/v1/production/uploads/635fd4cc14657fb8cff2a081/GDkyDwAcuqDBpaOvQgJuq.png)