zaydzuhri
/

vanilla-340M-4096-model

Model card Files Files and versions

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

This model is from the paper arxiv.org/abs/2504.20966

Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

Also used in arxiv.org/abs/2508.19228

Token Order Prediction

See code: https://github.com/zaydzuhri/softpick-attention

This model is only usable through these repositories: https://github.com/zaydzuhri/flash-linear-attention/tree/softpick-attention https://github.com/zaydzuhri/flame/tree/softpick-attention

Downloads last month: 115

Safetensors

Model size

374M params

Tensor type

F32

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collections including zaydzuhri/vanilla-340M-4096-model

Softpick

Pretrained models from the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax" • 5 items • Updated 5 days ago

Token Order Prediction

Pretrained models from the paper "Predicting the Order of Upcoming Tokens Improves Language Modeling" • 10 items • Updated 1 day ago