Aidan Mannion commited on
Commit
4fd6b3a
·
1 Parent(s): 526d457

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -60,6 +60,7 @@ Experiments on general-domain data suggest that, given it's specialised training
60
  - linear learning rate schedule with 10,770 warmup steps
61
  - effective batch size 1500 (15 sequences per batch x 100 gradient accumulation steps)
62
  - MLM masking probability 0.15
 
63
  **Training regime:** The model was trained with fp16 non-mixed precision, using the AdamW optimizer with default parameters.
64
 
65
 
 
60
  - linear learning rate schedule with 10,770 warmup steps
61
  - effective batch size 1500 (15 sequences per batch x 100 gradient accumulation steps)
62
  - MLM masking probability 0.15
63
+
64
  **Training regime:** The model was trained with fp16 non-mixed precision, using the AdamW optimizer with default parameters.
65
 
66