Linear_Tiny_87M

Introduction

Linear transformers have emerged as a subquadratic-time alternative to softmax attention and have garnered significant interest due to their fixed-size recurrent state that lowers inference cost. However, their original formulation suffers from poor scaling and underperforms compute-matched transformers. Recent linear models such as RWKV and Mamba have attempted to address these shortcomings by proposing novel time-mixing and gating architectures, but pre-training large language models requires significant data and compute investments. Thus, the search for subquadratic architectures is limited by the availability of compute and quality pre-training datasets. As a cost-effective alternative to pre-training linear transformers, we propose Scalable UPtraining for Recurrent Attention (SUPRA). For more detail, refer to the paper

Linear_Tiny_87M is a linear model that has been trained on a subset of redpajama dataset for 1 epoch on 1x A4000. It took almost 4 hours for training to be completed.

Usage

Just download the checkpoint and afterwards run the following code snippet:

cd scripts

python generate.py \
--model open_lm_87m \
--checkpoint /path/to/checkpoint.pt \
--positional-embedding-type head_rotary \
--input-text "Machine Learning is a"
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.