Update README.md
Browse files
README.md
CHANGED
@@ -82,18 +82,21 @@ A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/goo
|
|
82 |
|
83 |
- 20+ epochs of fine-tuning from the base model on V100/A100 GPUs
|
84 |
- all training used 16384 token input / 1024 max output
|
85 |
-
|
86 |
|
87 |
## Intended uses & limitations
|
88 |
|
89 |
- At the time of writing, the model is not _fully converged_ despite training for 20+ epochs. This checkpoint is serviceable enough (see examples).
|
90 |
-
- I plan to update this page with newer checkpoints and post some metrics over time.
|
91 |
-
- Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
|
|
|
92 |
|
93 |
## Training and evaluation data
|
94 |
|
95 |
`kmfoda/booksum` dataset. Summaries longer than 1024 LongT5 tokens were filtered out with the intent of preventing the model from learning to generate "partial" summaries.
|
96 |
|
|
|
|
|
97 |
## Training procedure
|
98 |
|
99 |
### Training hyperparameters
|
|
|
82 |
|
83 |
- 20+ epochs of fine-tuning from the base model on V100/A100 GPUs
|
84 |
- all training used 16384 token input / 1024 max output
|
85 |
+
|
86 |
|
87 |
## Intended uses & limitations
|
88 |
|
89 |
- At the time of writing, the model is not _fully converged_ despite training for 20+ epochs. This checkpoint is serviceable enough (see examples).
|
90 |
+
- I plan to update this page with newer checkpoints and post some metrics over time.
|
91 |
+
- Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
|
92 |
+
- while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
|
93 |
|
94 |
## Training and evaluation data
|
95 |
|
96 |
`kmfoda/booksum` dataset. Summaries longer than 1024 LongT5 tokens were filtered out with the intent of preventing the model from learning to generate "partial" summaries.
|
97 |
|
98 |
+
> - early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens** and then trained further for at least five epochs.
|
99 |
+
|
100 |
## Training procedure
|
101 |
|
102 |
### Training hyperparameters
|