Update README.md
Browse files
README.md
CHANGED
@@ -90,7 +90,7 @@ Whole Word Masking 単語分割器には、[vibrato](https://github.com/daac-too
|
|
90 |
| Batch Size (tokens) | 1146880 | 2293760 |
|
91 |
| Max Learning Rate | 1.0E-4 | 1.0E-4 |
|
92 |
| Min Learning Rate | 1.0E-6 | N/A |
|
93 |
-
| Learning Rate Warmup Steps | 10000 |
|
94 |
| Scheduler | cosine | constant |
|
95 |
| Optimizer | AdamW | AdamW |
|
96 |
| Optimizer Config | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 |
|
@@ -234,7 +234,7 @@ We only implemented Masked Language Modeling (MLM) during training, without Next
|
|
234 |
| Batch Size (tokens) | 1146880 | 2293760 |
|
235 |
| Max Learning Rate | 1.0E-4 | 1.0E-4 |
|
236 |
| Min Learning Rate | 1.0E-6 | N/A |
|
237 |
-
| Learning Rate Warmup Steps | 10000 |
|
238 |
| Scheduler | cosine | constant |
|
239 |
| Optimizer | AdamW | AdamW |
|
240 |
| Optimizer Config | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 |
|
|
|
90 |
| Batch Size (tokens) | 1146880 | 2293760 |
|
91 |
| Max Learning Rate | 1.0E-4 | 1.0E-4 |
|
92 |
| Min Learning Rate | 1.0E-6 | N/A |
|
93 |
+
| Learning Rate Warmup Steps | 10000 | N/A |
|
94 |
| Scheduler | cosine | constant |
|
95 |
| Optimizer | AdamW | AdamW |
|
96 |
| Optimizer Config | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 |
|
|
|
234 |
| Batch Size (tokens) | 1146880 | 2293760 |
|
235 |
| Max Learning Rate | 1.0E-4 | 1.0E-4 |
|
236 |
| Min Learning Rate | 1.0E-6 | N/A |
|
237 |
+
| Learning Rate Warmup Steps | 10000 | N/A |
|
238 |
| Scheduler | cosine | constant |
|
239 |
| Optimizer | AdamW | AdamW |
|
240 |
| Optimizer Config | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 | beta_1 = 0.9, beta_2 = 0.999, eps = 1.0E-8 |
|