pszemraj
/

long-t5-tglobal-base-16384-book-summary

text2text-generation

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Jun 27, 2022

Commit

fa508cb

·

1 Parent(s): 768a7be

:art:

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -67,7 +67,7 @@ inference:
 # long-t5-tglobal-base-16384-booksum
 - summarize long text and get a SparkNotes-esque summary of arbitrary topics!
-- generalizes fairly well to academic & narrative text.
 ## Cheeky Proof-of-Concept
@@ -80,9 +80,9 @@ A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/
 A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
-- between different checkpoints, about 20 epochs in total
-- all training was done at 16384 token input / 1024 max output
-- early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens**, and then trained further for at least five epochs.
 ## Intended uses & limitations
@@ -92,7 +92,7 @@ A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/goo
 ## Training and evaluation data
-`kmfoda/booksum` dataset. Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate "partial" summaries.
 ## Training procedure
@@ -111,7 +111,7 @@ The following hyperparameters were used during the **final** training round\*:
 - lr_scheduler_warmup_ratio: 0.02
 - num_epochs: 2
-\*_Prior training sessions used roughly similar parameters, multiple sessions were required as this takes eons to train_
 ### Training results

 # long-t5-tglobal-base-16384-booksum
 - summarize long text and get a SparkNotes-esque summary of arbitrary topics!
+- generalizes reasonably well to academic & narrative text.
 ## Cheeky Proof-of-Concept
 A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
+- 20+ epochs of fine-tuning from the base model on V100/A100 GPUs
+- all training used 16384 token input / 1024 max output
+- early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens**and then trained further for at least five epochs.
 ## Intended uses & limitations
 ## Training and evaluation data
+`kmfoda/booksum` dataset. Summaries longer than 1024 LongT5 tokens were filtered out with the intent of preventing the model from learning to generate "partial" summaries.
 ## Training procedure
 - lr_scheduler_warmup_ratio: 0.02
 - num_epochs: 2
+\*_Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train_
 ### Training results