Syed-Hasan-8503
/

Linear_Tiny_87M

Model card Files Files and versions Community

Linear_Tiny_87M / README.md

Syed-Hasan-8503's picture

Syed-Hasan-8503

Update README.md

dd4cfdb verified 10 months ago

|

history blame contribute delete

1.39 kB

	---
	license: apache-2.0
	---

	# Linear_Tiny_87M

	## Introduction

	Linear transformers have emerged as a subquadratic-time alternative to softmax attention and have garnered significant interest due to their fixed-size recurrent state that lowers inference cost. However, their original formulation suffers from poor scaling and underperforms compute-matched transformers. Recent linear models such as RWKV and Mamba have attempted to address these shortcomings by proposing novel time-mixing and gating architectures, but pre-training large language models requires significant data and compute investments. Thus, the search for subquadratic architectures is limited by the availability of compute and quality pre-training datasets.
	As a cost-effective alternative to pre-training linear transformers, we propose Scalable UPtraining for Recurrent Attention (SUPRA). For more detail, refer to the
	[paper](https://arxiv.org/abs/2405.06640)


	Linear_Tiny_87M is a linear model that has been trained on a subset of redpajama dataset for 1 epoch on 1x A4000. It took almost 4 hours for training to be completed.

	## Usage

	Just download the checkpoint and afterwards run the following code snippet:

	```python
	cd scripts

	python generate.py \
	--model open_lm_87m \
	--checkpoint /path/to/checkpoint.pt \
	--positional-embedding-type head_rotary \
	--input-text "Machine Learning is a"

	```