add whats new
Browse files
README.md
CHANGED
@@ -353,8 +353,7 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
|
|
353 |
|
354 |
## Intended uses & limitations
|
355 |
|
356 |
-
-
|
357 |
-
- I plan to update this page with newer checkpoints and post some metrics over time.
|
358 |
- Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
|
359 |
- while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
|
360 |
|
@@ -368,19 +367,20 @@ _NOTE: early checkpoints of this model were trained on a "smaller" subsection of
|
|
368 |
|
369 |
### Updates:
|
370 |
|
|
|
371 |
- July 3, 2022: Added a new version with several epochs of additional training that is more performant in general.
|
372 |
|
373 |
### Training hyperparameters
|
374 |
|
375 |
The following hyperparameters were used during the **most recent** training round\*:
|
376 |
|
377 |
-
- learning_rate: 0.
|
378 |
- train_batch_size: 1
|
379 |
- eval_batch_size: 1
|
380 |
- seed: 42
|
381 |
- distributed_type: multi-GPU
|
382 |
-
- gradient_accumulation_steps:
|
383 |
-
- total_train_batch_size:
|
384 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
385 |
- lr_scheduler_type: cosine
|
386 |
- lr_scheduler_warmup_ratio: 0.01
|
|
|
353 |
|
354 |
## Intended uses & limitations
|
355 |
|
356 |
+
- The current checkpoint is fairly well converged but will be updated if further improvements can be made.
|
|
|
357 |
- Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
|
358 |
- while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
|
359 |
|
|
|
367 |
|
368 |
### Updates:
|
369 |
|
370 |
+
- July 22, 2022: updated to a fairly converged checkpoint
|
371 |
- July 3, 2022: Added a new version with several epochs of additional training that is more performant in general.
|
372 |
|
373 |
### Training hyperparameters
|
374 |
|
375 |
The following hyperparameters were used during the **most recent** training round\*:
|
376 |
|
377 |
+
- learning_rate: 0.0005
|
378 |
- train_batch_size: 1
|
379 |
- eval_batch_size: 1
|
380 |
- seed: 42
|
381 |
- distributed_type: multi-GPU
|
382 |
+
- gradient_accumulation_steps: 128
|
383 |
+
- total_train_batch_size: 128
|
384 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
385 |
- lr_scheduler_type: cosine
|
386 |
- lr_scheduler_warmup_ratio: 0.01
|