Intel
/

gpt-j-6b-sparse

Text Generation

Model card Files Files and versions

weiweiz1 commited on Dec 6, 2023

Commit

7dcceb0

·

1 Parent(s): c8254c9

Update README.md

Files changed (1) hide show

README.md +12 -1

README.md CHANGED Viewed

@@ -33,4 +33,15 @@ The sparse version of GPT-J 6B is a pruned variant derived from the original [GP
 The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384. The model
 dimension is split into 16 heads, each with a dimension of 256. Rotary Position Embedding (RoPE) is applied to 64
 dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as
-GPT-2/GPT-3.

 The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384. The model
 dimension is split into 16 heads, each with a dimension of 256. Rotary Position Embedding (RoPE) is applied to 64
 dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as
+GPT-2/GPT-3.
+## Evaluation results
+<figure>
+|  Model    | Sparsity | Dataset | Precision | Dense Acc  ↑ | Sparse Acc ↑  | Acc fluctuations |
+|--------------|--------|----------------|-------   |-------   |--------   |--------   |
+| gpt-j-6B   | 40%    |Lambada_openai     | FP32        | 0.6831     | 0.6922     | +1.33%      |
+| gpt-j-6B   | 40%    |Lambada_openai     | BF16        | 0.6771     | 0.6874     | +0.63%      |