Softpick
Collection
Pretrained models from the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"
•
5 items
•
Updated
See code: https://github.com/zaydzuhri/softpick-attention
This model is only usable through these repositories: https://github.com/zaydzuhri/flash-linear-attention/tree/softpick-attention https://github.com/zaydzuhri/flame/tree/softpick-attention