julien-c HF staff commited on
Commit
476a3ea
·
1 Parent(s): be7093c

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/shoarora/electra-small-owt/README.md

Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ELECTRA-small-OWT
2
+
3
+ This is an unnoficial implementation of an
4
+ [ELECTRA](https://openreview.net/forum?id=r1xMH1BtvB) small model, trained on the
5
+ [OpenWebText corpus](https://skylion007.github.io/OpenWebTextCorpus/).
6
+
7
+ Differences from official ELECTRA models:
8
+ - we use a `BertForMaskedLM` as the generator and `BertForTokenClassification` as the discriminator
9
+ - they use an embedding projection layer, but Bert doesn't have one
10
+
11
+ ## Pretraining ttask
12
+ ![electra task diagram](https://github.com/shoarora/lmtuners/raw/master/assets/electra.png)
13
+ (figure from [Clark et al. 2020](https://openreview.net/pdf?id=r1xMH1BtvB))
14
+
15
+ ELECTRA uses discriminative LM / replaced-token-detection for pretraining.
16
+ This involves a generator (a Masked LM model) creating examples for a discriminator
17
+ to classify as original or replaced for each token.
18
+
19
+
20
+ ## Usage
21
+ ```python
22
+ from transformers import BertForSequenceClassification, BertTokenizer
23
+
24
+ tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
25
+ electra = BertForSequenceClassification.from_pretrained('shoarora/electra-small-owt')
26
+ ```
27
+
28
+ ## Code
29
+ The pytorch module that implements this task is available [here](https://github.com/shoarora/lmtuners/blob/master/lmtuners/lightning_modules/discriminative_lm.py).
30
+
31
+ Further implementation information [here](https://github.com/shoarora/lmtuners/tree/master/experiments/disc_lm_small),
32
+ and [here](https://github.com/shoarora/lmtuners/blob/master/experiments/disc_lm_small/train_electra_small.py) is the script that created this model.
33
+
34
+ This specific model was trained with the following params:
35
+ - `batch_size: 512`
36
+ - `training_steps: 5e5`
37
+ - `warmup_steps: 4e4`
38
+ - `learning_rate: 2e-3`
39
+
40
+
41
+ ## Downstream tasks
42
+ #### GLUE Dev results
43
+ | Model | # Params | CoLA | SST | MRPC | STS | QQP | MNLI | QNLI | RTE |
44
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
45
+ | ELECTRA-Small++ | 14M | 57.0 | 91. | 88.0 | 87.5 | 89.0 | 81.3 | 88.4 | 66.7|
46
+ | ELECTRA-Small-OWT | 14M | 56.8 | 88.3| 87.4 | 86.8 | 88.3 | 78.9 | 87.9 | 68.5|
47
+ | ELECTRA-Small-OWT (ours) | 17M | 56.3 | 88.4| 75.0 | 86.1 | 89.1 | 77.9 | 83.0 | 67.1|
48
+ | ALECTRA-Small-OWT (ours) | 4M | 50.6 | 89.1| 86.3 | 87.2 | 89.1 | 78.2 | 85.9 | 69.6|
49
+
50
+ - Table initialized from [ELECTRA github repo](https://github.com/google-research/electra)
51
+
52
+ #### GLUE Test results
53
+ | Model | # Params | CoLA | SST | MRPC | STS | QQP | MNLI | QNLI | RTE |
54
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
55
+ | BERT-Base | 110M | 52.1 | 93.5| 84.8 | 85.9 | 89.2 | 84.6 | 90.5 | 66.4|
56
+ | GPT | 117M | 45.4 | 91.3| 75.7 | 80.0 | 88.5 | 82.1 | 88.1 | 56.0|
57
+ | ELECTRA-Small++ | 14M | 57.0 | 91.2| 88.0 | 87.5 | 89.0 | 81.3 | 88.4 | 66.7|
58
+ | ELECTRA-Small-OWT (ours) | 17M | 57.4 | 89.3| 76.2 | 81.9 | 87.5 | 78.1 | 82.4 | 68.1|
59
+ | ALECTRA-Small-OWT (ours) | 4M | 43.9 | 87.9| 82.1 | 82.0 | 87.6 | 77.9 | 85.8 | 67.5|