Aidan Mannion commited on
Commit
b64e6bb
·
1 Parent(s): 9e589a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -56,7 +56,7 @@ Experiments on general-domain data suggest that, given it's specialised training
56
  #### Training Hyperparameters
57
 
58
  - sequence length: 256
59
- - learning rate $7.5\times10^{-5}$
60
  - linear learning rate schedule with 10,770 warmup steps
61
  - effective batch size 1500 (15 sequences per batch x 100 gradient accumulation steps)
62
  - MLM masking probability 0.15
 
56
  #### Training Hyperparameters
57
 
58
  - sequence length: 256
59
+ - learning rate 7.5e-5
60
  - linear learning rate schedule with 10,770 warmup steps
61
  - effective batch size 1500 (15 sequences per batch x 100 gradient accumulation steps)
62
  - MLM masking probability 0.15