Aidan Mannion
commited on
Commit
·
b64e6bb
1
Parent(s):
9e589a4
Update README.md
Browse files
README.md
CHANGED
@@ -56,7 +56,7 @@ Experiments on general-domain data suggest that, given it's specialised training
|
|
56 |
#### Training Hyperparameters
|
57 |
|
58 |
- sequence length: 256
|
59 |
-
- learning rate
|
60 |
- linear learning rate schedule with 10,770 warmup steps
|
61 |
- effective batch size 1500 (15 sequences per batch x 100 gradient accumulation steps)
|
62 |
- MLM masking probability 0.15
|
|
|
56 |
#### Training Hyperparameters
|
57 |
|
58 |
- sequence length: 256
|
59 |
+
- learning rate 7.5e-5
|
60 |
- linear learning rate schedule with 10,770 warmup steps
|
61 |
- effective batch size 1500 (15 sequences per batch x 100 gradient accumulation steps)
|
62 |
- MLM masking probability 0.15
|