Commit
·
30846b9
1
Parent(s):
9c3f3fb
Update README.md
Browse files
README.md
CHANGED
@@ -185,7 +185,7 @@ parameters:
|
|
185 |
encoder_no_repeat_ngram_size: 3
|
186 |
num_beams: 4
|
187 |
model-index:
|
188 |
-
- name:
|
189 |
results:
|
190 |
- task:
|
191 |
type: summarization
|
@@ -499,7 +499,7 @@ from transformers import pipeline
|
|
499 |
|
500 |
summarizer = pipeline(
|
501 |
"summarization",
|
502 |
-
"
|
503 |
device=0 if torch.cuda.is_available() else -1,
|
504 |
)
|
505 |
long_text = "Here is a lot of text I don't want to read. Replace me"
|
@@ -508,37 +508,6 @@ result = summarizer(long_text)
|
|
508 |
print(result[0]["summary_text"])
|
509 |
```
|
510 |
|
511 |
-
Pass [other parameters related to beam search textgen](https://huggingface.co/blog/how-to-generate) when calling `summarizer` to get even higher quality results.
|
512 |
-
|
513 |
-
## Intended uses & limitations
|
514 |
-
|
515 |
-
- The current checkpoint is fairly well converged but will be updated if further improvements can be made.
|
516 |
-
- Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
|
517 |
-
- while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
|
518 |
-
|
519 |
-
## Training and evaluation data
|
520 |
-
|
521 |
-
`kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209). Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate "partial" summaries.
|
522 |
-
|
523 |
-
|
524 |
-
|
525 |
-
### How to run inference over a very long (30k+ tokens) document in batches?
|
526 |
-
|
527 |
-
See `summarize.py` in [the code for my hf space Document Summarization](https://huggingface.co/spaces/pszemraj/document-summarization/blob/main/summarize.py) :)
|
528 |
-
|
529 |
-
You can also use the same code to split a document into batches of 4096, etc., and run over those with the model. This is useful in situations where CUDA memory is limited.
|
530 |
-
|
531 |
-
### How to fine-tune further?
|
532 |
-
|
533 |
-
See [train with a script](https://huggingface.co/docs/transformers/run_scripts) and [the summarization scripts](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization).
|
534 |
-
|
535 |
-
This model was originally tuned on Google Colab with a heavily modified variant of the [longformer training notebook](https://github.com/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb), key enabler being deepspeed. You can try this as an alternate route to fine-tuning the model without using the command line.
|
536 |
-
|
537 |
-
* * *
|
538 |
-
|
539 |
-
## Training procedure
|
540 |
-
|
541 |
-
|
542 |
### Training hyperparameters
|
543 |
|
544 |
_NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens** and then trained further for 10+ epochs._
|
|
|
185 |
encoder_no_repeat_ngram_size: 3
|
186 |
num_beams: 4
|
187 |
model-index:
|
188 |
+
- name: Shobhank-iiitdwd/long-t5-tglobal-base-16384-book-summary
|
189 |
results:
|
190 |
- task:
|
191 |
type: summarization
|
|
|
499 |
|
500 |
summarizer = pipeline(
|
501 |
"summarization",
|
502 |
+
"Shobhank-iiitdwd/long-t5-tglobal-base-16384-book-summary",
|
503 |
device=0 if torch.cuda.is_available() else -1,
|
504 |
)
|
505 |
long_text = "Here is a lot of text I don't want to read. Replace me"
|
|
|
508 |
print(result[0]["summary_text"])
|
509 |
```
|
510 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
511 |
### Training hyperparameters
|
512 |
|
513 |
_NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens** and then trained further for 10+ epochs._
|