Update README.md
Browse files
README.md
CHANGED
|
@@ -33,4 +33,15 @@ The sparse version of GPT-J 6B is a pruned variant derived from the original [GP
|
|
| 33 |
The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384. The model
|
| 34 |
dimension is split into 16 heads, each with a dimension of 256. Rotary Position Embedding (RoPE) is applied to 64
|
| 35 |
dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as
|
| 36 |
-
GPT-2/GPT-3.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384. The model
|
| 34 |
dimension is split into 16 heads, each with a dimension of 256. Rotary Position Embedding (RoPE) is applied to 64
|
| 35 |
dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as
|
| 36 |
+
GPT-2/GPT-3.
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
## Evaluation results
|
| 41 |
+
|
| 42 |
+
<figure>
|
| 43 |
+
|
| 44 |
+
| Model | Sparsity | Dataset | Precision | Dense Acc ↑ | Sparse Acc ↑ | Acc fluctuations |
|
| 45 |
+
|--------------|--------|----------------|------- |------- |-------- |-------- |
|
| 46 |
+
| gpt-j-6B | 40% |Lambada_openai | FP32 | 0.6831 | 0.6922 | +1.33% |
|
| 47 |
+
| gpt-j-6B | 40% |Lambada_openai | BF16 | 0.6771 | 0.6874 | +0.63% |
|