Syed-Hasan-8503 commited on
Commit
b105c4d
·
verified ·
1 Parent(s): 8d6a77b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # Linear_Tiny_87M
6
+
7
+ ## Introduction
8
+
9
+ Linear transformers have emerged as a subquadratic-time alternative to softmax attention and have garnered significant interest due to their fixed-size recurrent state that lowers inference cost. However, their original formulation suffers from poor scaling and underperforms compute-matched transformers. Recent linear models such as RWKV and Mamba have attempted to address these shortcomings by proposing novel time-mixing and gating architectures, but pre-training large language models requires significant data and compute investments. Thus, the search for subquadratic architectures is limited by the availability of compute and quality pre-training datasets.
10
+ As a cost-effective alternative to pre-training linear transformers, we propose Scalable UPtraining for Recurrent Attention (SUPRA). For more detail, refer to the
11
+ [paper](https://arxiv.org/abs/2405.06640)
12
+
13
+
14
+ Linear_Tiny_87M is a linear model that has been trained on a subset of redpajama dataset for 1 epoch on **1x A4000**. It took almost 4 hours for training to be completed.
15
+
16
+ ## Usage
17
+
18
+ Just download the checkpoint and afterwards run the following code snippet:
19
+
20
+ '''
21
+ cd scripts
22
+
23
+ python generate.py \
24
+ --model open_lm_87m \
25
+ --checkpoint /path/to/checkpoint.pt \
26
+ --positional-embedding-type head_rotary \
27
+ --input-text "Machine Learning is a"
28
+
29
+ '''