Philip May
commited on
Commit
·
9c4ee35
1
Parent(s):
8e40b5d
Update README.md
Browse files
README.md
CHANGED
@@ -37,12 +37,13 @@ The training was conducted with the following hyperparameters:
|
|
37 |
- warmup_ratio: 0.3
|
38 |
- number of train epochs: 10
|
39 |
- gradient accumulation steps: 2
|
|
|
40 |
|
41 |
## Datasets and Preprocessing
|
42 |
|
43 |
The datasets were preprocessed as follows:
|
44 |
|
45 |
-
The summary was tokenized with the [google/mt5-small](https://huggingface.co/google/mt5-small) tokenizer. Then only the records with no more than 94 tokens were selected.
|
46 |
|
47 |
The MLSUM dataset has a special characteristic. In the text, the summary is often included completely as one or more sentences. These have been removed from the texts. The reason is that we do not want to train a model that ultimately extracts only sentences as a summary.
|
48 |
|
|
|
37 |
- warmup_ratio: 0.3
|
38 |
- number of train epochs: 10
|
39 |
- gradient accumulation steps: 2
|
40 |
+
- learning rate: 5e-5
|
41 |
|
42 |
## Datasets and Preprocessing
|
43 |
|
44 |
The datasets were preprocessed as follows:
|
45 |
|
46 |
+
The summary was tokenized with the [google/mt5-small](https://huggingface.co/google/mt5-small) tokenizer. Then only the records with no more than 94 summary tokens were selected.
|
47 |
|
48 |
The MLSUM dataset has a special characteristic. In the text, the summary is often included completely as one or more sentences. These have been removed from the texts. The reason is that we do not want to train a model that ultimately extracts only sentences as a summary.
|
49 |
|