:art:
Browse files
README.md
CHANGED
@@ -67,7 +67,7 @@ inference:
|
|
67 |
# long-t5-tglobal-base-16384-booksum
|
68 |
|
69 |
- summarize long text and get a SparkNotes-esque summary of arbitrary topics!
|
70 |
-
- generalizes
|
71 |
|
72 |
## Cheeky Proof-of-Concept
|
73 |
|
@@ -80,9 +80,9 @@ A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/
|
|
80 |
|
81 |
A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
|
82 |
|
83 |
-
-
|
84 |
-
- all training
|
85 |
-
- early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens
|
86 |
|
87 |
## Intended uses & limitations
|
88 |
|
@@ -92,7 +92,7 @@ A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/goo
|
|
92 |
|
93 |
## Training and evaluation data
|
94 |
|
95 |
-
`kmfoda/booksum` dataset. Summaries longer than 1024 LongT5 tokens were filtered out
|
96 |
|
97 |
## Training procedure
|
98 |
|
@@ -111,7 +111,7 @@ The following hyperparameters were used during the **final** training round\*:
|
|
111 |
- lr_scheduler_warmup_ratio: 0.02
|
112 |
- num_epochs: 2
|
113 |
|
114 |
-
\*_Prior training sessions used roughly similar parameters
|
115 |
|
116 |
### Training results
|
117 |
|
|
|
67 |
# long-t5-tglobal-base-16384-booksum
|
68 |
|
69 |
- summarize long text and get a SparkNotes-esque summary of arbitrary topics!
|
70 |
+
- generalizes reasonably well to academic & narrative text.
|
71 |
|
72 |
## Cheeky Proof-of-Concept
|
73 |
|
|
|
80 |
|
81 |
A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
|
82 |
|
83 |
+
- 20+ epochs of fine-tuning from the base model on V100/A100 GPUs
|
84 |
+
- all training used 16384 token input / 1024 max output
|
85 |
+
- early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens**and then trained further for at least five epochs.
|
86 |
|
87 |
## Intended uses & limitations
|
88 |
|
|
|
92 |
|
93 |
## Training and evaluation data
|
94 |
|
95 |
+
`kmfoda/booksum` dataset. Summaries longer than 1024 LongT5 tokens were filtered out with the intent of preventing the model from learning to generate "partial" summaries.
|
96 |
|
97 |
## Training procedure
|
98 |
|
|
|
111 |
- lr_scheduler_warmup_ratio: 0.02
|
112 |
- num_epochs: 2
|
113 |
|
114 |
+
\*_Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train_
|
115 |
|
116 |
### Training results
|
117 |
|