Update README.md
Browse files
README.md
CHANGED
@@ -40,7 +40,7 @@ The vocabulary size is 32768.
|
|
40 |
|
41 |
## Training
|
42 |
|
43 |
-
The models are trained with the same configuration as ELECTRA small in the [original ELECTRA paper](https://arxiv.org/abs/2003.10555); 128 tokens per instance, 128 instances per batch, and 1M training steps.
|
44 |
|
45 |
The size of the generator is the same of the discriminator.
|
46 |
|
|
|
40 |
|
41 |
## Training
|
42 |
|
43 |
+
The models are trained with the same configuration as ELECTRA small in the [original ELECTRA paper](https://arxiv.org/abs/2003.10555) except size; 128 tokens per instance, 128 instances per batch, and 1M training steps.
|
44 |
|
45 |
The size of the generator is the same of the discriminator.
|
46 |
|