pszemraj commited on
Commit
846fb35
·
1 Parent(s): dbb351e

add whats new

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -353,8 +353,7 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
353
 
354
  ## Intended uses & limitations
355
 
356
- - At the time of writing, the model is not _fully converged_ despite training for 20+ epochs. This checkpoint is serviceable enough (see examples).
357
- - I plan to update this page with newer checkpoints and post some metrics over time.
358
  - Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
359
  - while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
360
 
@@ -368,19 +367,20 @@ _NOTE: early checkpoints of this model were trained on a "smaller" subsection of
368
 
369
  ### Updates:
370
 
 
371
  - July 3, 2022: Added a new version with several epochs of additional training that is more performant in general.
372
 
373
  ### Training hyperparameters
374
 
375
  The following hyperparameters were used during the **most recent** training round\*:
376
 
377
- - learning_rate: 0.0006
378
  - train_batch_size: 1
379
  - eval_batch_size: 1
380
  - seed: 42
381
  - distributed_type: multi-GPU
382
- - gradient_accumulation_steps: 64
383
- - total_train_batch_size: 64
384
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
385
  - lr_scheduler_type: cosine
386
  - lr_scheduler_warmup_ratio: 0.01
 
353
 
354
  ## Intended uses & limitations
355
 
356
+ - The current checkpoint is fairly well converged but will be updated if further improvements can be made.
 
357
  - Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
358
  - while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
359
 
 
367
 
368
  ### Updates:
369
 
370
+ - July 22, 2022: updated to a fairly converged checkpoint
371
  - July 3, 2022: Added a new version with several epochs of additional training that is more performant in general.
372
 
373
  ### Training hyperparameters
374
 
375
  The following hyperparameters were used during the **most recent** training round\*:
376
 
377
+ - learning_rate: 0.0005
378
  - train_batch_size: 1
379
  - eval_batch_size: 1
380
  - seed: 42
381
  - distributed_type: multi-GPU
382
+ - gradient_accumulation_steps: 128
383
+ - total_train_batch_size: 128
384
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
385
  - lr_scheduler_type: cosine
386
  - lr_scheduler_warmup_ratio: 0.01