Aidan Mannion commited on
Commit
ee1b6d6
·
1 Parent(s): e0b4b0e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -63,6 +63,7 @@ Experiments on general-domain data suggest that, given it's specialised training
63
  - linear learning rate schedule with 10,770 warmup steps
64
  - effective batch size 1500 (15 sequences per batch x 100 gradient accumulation steps)
65
  - MLM masking probability 0.15
 
66
  **Training regime:** The model was trained with fp16 non-mixed precision, using the AdamW optimizer with default parameters.
67
 
68
 
 
63
  - linear learning rate schedule with 10,770 warmup steps
64
  - effective batch size 1500 (15 sequences per batch x 100 gradient accumulation steps)
65
  - MLM masking probability 0.15
66
+
67
  **Training regime:** The model was trained with fp16 non-mixed precision, using the AdamW optimizer with default parameters.
68
 
69