andrejmiscic
/

simcls-scorer-billsum

Feature Extraction

Inference Endpoints

Model card Files Files and versions Community

andrejmiscic commited on Oct 16, 2021

Commit

53d96f5

·

1 Parent(s): 8f7a0c7

Create README.md

Files changed (1) hide show

README.md +51 -0

README.md ADDED Viewed

	@@ -0,0 +1,51 @@

+---
+language:
+  - en
+tags:
+- simcls
+datasets:
+- billsum
+---
+# SimCLS
+SimCLS is a framework for abstractive summarization presented in [SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization](https://arxiv.org/abs/2106.01890).
+It is a two-stage approach consisting of a *generator* and a *scorer*. In the first stage, a large pre-trained model for abstractive summarization (the *generator*) is used to generate candidate summaries, whereas, in the second stage, the *scorer* assigns a score to each candidate given the source document. The final summary is the highest-scoring candidate.
+This model is the *scorer* trained for summarization of BillSum ([paper](https://arxiv.org/abs/1910.00523), [datasets](https://huggingface.co/datasets/billsum)). It should be used in conjunction with [google/pegasus-billsum](https://huggingface.co/google/pegasus-billsum). See [our Github repository](https://github.com/andrejmiscic/simcls-pytorch) for details on training, evaluation, and usage.
+## Usage
+```bash
+git clone https://github.com/andrejmiscic/simcls-pytorch.git
+cd simcls-pytorch
+pip3 install torch torchvision torchaudio transformers sentencepiece
+```
+```python
+from src.model import SimCLS, GeneratorType
+summarizer = SimCLS(generator_type=GeneratorType.Pegasus,
+                    generator_path="google/pegasus-billsum",
+                    scorer_path="andrejmiscic/simcls-scorer-billsum")
+document = "This is a legal document."
+summary = summarizer(document)
+print(summary)
+```
+### Results
+All of our results are reported together with 95% confidence intervals computed using 10000 iterations of bootstrap. See [SimCLS paper](https://arxiv.org/abs/2106.01890) for a description of baselines.
+We believe the discrepancies of Rouge-L scores between the original Pegasus work and our evaluation are due to the computation of the metric. Namely, we use a summary level Rouge-L score.
+| System          |               Rouge-1 |               Rouge-2 |              Rouge-L\* |
+|-----------------|----------------------:|----------------------:|----------------------:|
+| Pegasus         |                 57.31 |                 40.19 |                 45.82 |
+| **Our results** |          ---          |          ---          |          ---          |
+| Origin          | 56.24, [55.74, 56.74] | 37.46, [36.89, 38.03] | 50.71, [50.19, 51.22] |
+| Min             | 44.37, [43.85, 44.89] | 25.75, [25.30, 26.22] | 38.68, [38.18, 39.16] |
+| Max             | 62.88, [62.42, 63.33] | 43.96, [43.39, 44.54] | 57.50, [57.01, 58.00] |
+| Random          | 54.93, [54.43, 55.43] | 35.42, [34.85, 35.97] | 49.19, [48.68, 49.70] |
+| **SimCLS**      | 57.49, [57.01, 58.00] | 38.54, [37.98, 39.10] | 51.91, [51.39, 52.43] |