Safetensors
English
gsa
GSA-340M / README.md
JusenK's picture
Update README.md
e87d6f3 verified
---
license: apache-2.0
datasets:
- cerebras/SlimPajama-627B
language:
- en
---
Model of the paper [MoM: Linear Sequence Modeling with Mixture-of-Memories](https://arxiv.org/abs/2502.13685) and [Gated Slot Attention for Efficient Linear-Time Sequence Modeling](https://arxiv.org/abs/2409.07146).
The model was trained on a sample of SlimPajama with 15B tokens.
Due to changes in the MLP layer structure in the latest version of fla, the weights cannot be loaded. You can use the version at [fla](https://github.com/fla-org/flash-linear-attention/tree/8346a33792558d8e3eb206fe18404de037e11d9c) instead.