pszemraj
/

long-t5-tglobal-base-16384-book-summary

text2text-generation

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Jul 23, 2022

Commit

846fb35

·

1 Parent(s): dbb351e

add whats new

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -353,8 +353,7 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
 ## Intended uses & limitations
-- At the time of writing, the model is not _fully converged_ despite training for 20+ epochs. This checkpoint is serviceable enough (see examples).
-  - I plan to update this page with newer checkpoints and post some metrics over time.
   - Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
 - while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
@@ -368,19 +367,20 @@ _NOTE: early checkpoints of this model were trained on a "smaller" subsection of
 ### Updates:
 - July 3, 2022: Added a new version with several epochs of additional training that is more performant in general.
 ### Training hyperparameters
 The following hyperparameters were used during the **most recent** training round\*:
-- learning_rate: 0.0006
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
 - distributed_type: multi-GPU
-- gradient_accumulation_steps: 64
-- total_train_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.01

 ## Intended uses & limitations
+- The current checkpoint is fairly well converged but will be updated if further improvements can be made.
   - Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
 - while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
 ### Updates:
+- July 22, 2022: updated to a fairly converged checkpoint
 - July 3, 2022: Added a new version with several epochs of additional training that is more performant in general.
 ### Training hyperparameters
 The following hyperparameters were used during the **most recent** training round\*:
+- learning_rate: 0.0005
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
 - distributed_type: multi-GPU
+- gradient_accumulation_steps: 128
+- total_train_batch_size: 128
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.01