Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

zaydzuhri
/
vanilla-340M-4096-model

Safetensors
transformer
Model card Files Files and versions
xet
Community
1
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
  • This model is from the paper arxiv.org/abs/2504.20966
    • Softpick: No Attention Sink, No Massive Activations with Rectified Softmax
      • Also used in arxiv.org/abs/2508.19228
        • Token Order Prediction

          This model is from the paper arxiv.org/abs/2504.20966

          Softpick: No Attention Sink, No Massive Activations with Rectified Softmax

          Also used in arxiv.org/abs/2508.19228

          Token Order Prediction

          See code: https://github.com/zaydzuhri/softpick-attention

          This model is only usable through these repositories: https://github.com/zaydzuhri/flash-linear-attention/tree/softpick-attention https://github.com/zaydzuhri/flame/tree/softpick-attention

          Downloads last month
          115
          Safetensors
          Model size
          374M params
          Tensor type
          F32
          ·
          Inference Providers NEW
          This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

          Collections including zaydzuhri/vanilla-340M-4096-model

          Softpick

          Collection
          Pretrained models from the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax" • 5 items • Updated 5 days ago

          Token Order Prediction

          Collection
          Pretrained models from the paper "Predicting the Order of Upcoming Tokens Improves Language Modeling" • 10 items • Updated 1 day ago
          Company
          TOS Privacy About Jobs
          Website
          Models Datasets Spaces Pricing Docs