Update README.md
Browse files
README.md
CHANGED
@@ -1,149 +0,0 @@
|
|
1 |
-
---
|
2 |
-
language:
|
3 |
-
- en
|
4 |
-
tags:
|
5 |
-
- summarization
|
6 |
-
datasets:
|
7 |
-
- xsum
|
8 |
-
metrics:
|
9 |
-
- rouge
|
10 |
-
widget:
|
11 |
-
- text: National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed
|
12 |
-
to buy rival Samba Financial Group for $15 billion in the biggest banking takeover
|
13 |
-
this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to
|
14 |
-
a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer
|
15 |
-
0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio
|
16 |
-
the banks set when they signed an initial framework agreement in June.The offer
|
17 |
-
is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24%
|
18 |
-
higher than the level the shares traded at before the talks were made public.
|
19 |
-
Bloomberg News first reported the merger discussions.The new bank will have total
|
20 |
-
assets of more than $220 billion, creating the Gulf region’s third-largest lender.
|
21 |
-
The entity’s $46 billion market capitalization nearly matches that of Qatar National
|
22 |
-
Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion
|
23 |
-
of assets.
|
24 |
-
model-index:
|
25 |
-
- name: human-centered-summarization/financial-summarization-pegasus
|
26 |
-
results:
|
27 |
-
- task:
|
28 |
-
type: summarization
|
29 |
-
name: Summarization
|
30 |
-
dataset:
|
31 |
-
name: xsum
|
32 |
-
type: xsum
|
33 |
-
config: default
|
34 |
-
split: test
|
35 |
-
metrics:
|
36 |
-
- type: rouge
|
37 |
-
value: 35.2055
|
38 |
-
name: ROUGE-1
|
39 |
-
verified: true
|
40 |
-
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTA5OTZkY2YxMDU1YzE3NGJlMmE1OTg1NjlmNzcxOTg4YzY2OThlOTlkNGFhMGFjZWY4YjdiMjU5NDdmMWYzNSIsInZlcnNpb24iOjF9.ufBRoV2JoX4UlEfAUOYq7F3tZougwngdpKlnaC37tYXJU3omsR5hTsWM69hSdYO-k0cKUbAWCAMzjmoGwIaPAw
|
41 |
-
- type: rouge
|
42 |
-
value: 16.5689
|
43 |
-
name: ROUGE-2
|
44 |
-
verified: true
|
45 |
-
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOWQwMmM2NjJjNzM1N2Y3NjZmMmE5NzNlNjRjNjEwNzNhNjcyZTRiMGRlODY3NWUyMGQ0YzZmMGFhODYzOTRmOSIsInZlcnNpb24iOjF9.AZZkbaYBZG6rw6-QHYjRlSl-p0gBT2EtJxwjIP7QYH5XIQjeoiQsTnDPIq25dSMDbmQLSZnpHC104ZctX0f_Dg
|
46 |
-
- type: rouge
|
47 |
-
value: 30.1285
|
48 |
-
name: ROUGE-L
|
49 |
-
verified: true
|
50 |
-
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTRjYThlMTllZjI4MGFiMDZhZTVkYmRjMTNhZDUzNTQ0OWQyNDQxMmQ5ODJiMmJiNGI3OTAzYjhiMzc2MTI4NCIsInZlcnNpb24iOjF9.zTHd3F4ZlgS-azl-ZVjOckcTrtrJmDOGWVaC3qQsvvn2UW9TnseNkmo7KBc3DJU7_NmlxWZArl1BdSetED0NCg
|
51 |
-
- type: rouge
|
52 |
-
value: 30.1706
|
53 |
-
name: ROUGE-LSUM
|
54 |
-
verified: true
|
55 |
-
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZGMzZGFjNzVkYWI0NTJkMmZjZDQ0YjhiYjIxN2VkNmJjMTgwZTk1NjFlOGU2NjNjM2VjYTNlYTBhNTQ5MGZkNSIsInZlcnNpb24iOjF9.xQ2LoI3PwlEiXo1OT2o4Pq9o2thYCd9lSCKCWlLmZdxI5GxdsjcASBKmHKopzUcwCGBPR7zF95MHSAPyszOODA
|
56 |
-
- type: loss
|
57 |
-
value: 2.7092134952545166
|
58 |
-
name: loss
|
59 |
-
verified: true
|
60 |
-
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzQzODE0NDc5YTYzYjJlMWU2YTVjOGRjN2JmYWVkOWNkNTRlMTZlOWIyN2NiODJkMDljMjI3YzZmYzM3N2JjYSIsInZlcnNpb24iOjF9.Vv_pdeFuRMoKK3cPr5P6n7D6_18ChJX-2qcT0y4is3XX3mS98fk3U1AYEuy9nBHOwYR3o0U8WBgQ-Ya_FqefBg
|
61 |
-
- type: gen_len
|
62 |
-
value: 15.1414
|
63 |
-
name: gen_len
|
64 |
-
verified: true
|
65 |
-
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYjk5OTk3NWRiNjZlZmQzMmYwOTU2MmQwOWE1MDNlNTg3YWVkOTgwOTc2ZTQ0MTBiZjliOWMyZTYwMDI2MDUzYiIsInZlcnNpb24iOjF9.Zvj84JzIhM50rWTQ2GrEeOU7HrS8KsILH-8ApTcSWSI6kVnucY0MyW2ODxvRAa_zHeCygFW6Q13TFGrT5kLNAA
|
66 |
-
---
|
67 |
-
|
68 |
-
### PEGASUS for Financial Summarization
|
69 |
-
|
70 |
-
This model was fine-tuned on a novel financial news dataset, which consists of 2K articles from [Bloomberg](https://www.bloomberg.com/europe), on topics such as stock, markets, currencies, rate and cryptocurrencies.
|
71 |
-
|
72 |
-
It is based on the [PEGASUS](https://huggingface.co/transformers/model_doc/pegasus.html) model and in particular PEGASUS fine-tuned on the Extreme Summarization (XSum) dataset: [google/pegasus-xsum model](https://huggingface.co/google/pegasus-xsum). PEGASUS was originally proposed by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu in [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf).
|
73 |
-
|
74 |
-
### How to use
|
75 |
-
We provide a simple snippet of how to use this model for the task of financial summarization in PyTorch.
|
76 |
-
|
77 |
-
```Python
|
78 |
-
from transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration
|
79 |
-
|
80 |
-
# Let's load the model and the tokenizer
|
81 |
-
model_name = "human-centered-summarization/financial-summarization-pegasus"
|
82 |
-
tokenizer = PegasusTokenizer.from_pretrained(model_name)
|
83 |
-
model = PegasusForConditionalGeneration.from_pretrained(model_name) # If you want to use the Tensorflow model
|
84 |
-
# just replace with TFPegasusForConditionalGeneration
|
85 |
-
|
86 |
-
|
87 |
-
# Some text to summarize here
|
88 |
-
text_to_summarize = "National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of assets."
|
89 |
-
|
90 |
-
# Tokenize our text
|
91 |
-
# If you want to run the code in Tensorflow, please remember to return the particular tensors as simply as using return_tensors = 'tf'
|
92 |
-
input_ids = tokenizer(text_to_summarize, return_tensors="pt").input_ids
|
93 |
-
|
94 |
-
# Generate the output (Here, we use beam search but you can also use any other strategy you like)
|
95 |
-
output = model.generate(
|
96 |
-
input_ids,
|
97 |
-
max_length=32,
|
98 |
-
num_beams=5,
|
99 |
-
early_stopping=True
|
100 |
-
)
|
101 |
-
|
102 |
-
# Finally, we can print the generated summary
|
103 |
-
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
104 |
-
# Generated Output: Saudi bank to pay a 3.5% premium to Samba share price. Gulf region’s third-largest lender will have total assets of $220 billion
|
105 |
-
```
|
106 |
-
|
107 |
-
## Evaluation Results
|
108 |
-
The results before and after the fine-tuning on our dataset are shown below:
|
109 |
-
|
110 |
-
|
111 |
-
| Fine-tuning | R-1 | R-2 | R-L | R-S |
|
112 |
-
|:-----------:|:-----:|:-----:|:------:|:-----:|
|
113 |
-
| Yes | 23.55 | 6.99 | 18.14 | 21.36 |
|
114 |
-
| No | 13.8 | 2.4 | 10.63 | 12.03 |
|
115 |
-
|
116 |
-
|
117 |
-
## Citation
|
118 |
-
|
119 |
-
You can find more details about this work in the following workshop paper. If you use our model in your research, please consider citing our paper:
|
120 |
-
|
121 |
-
> T. Passali, A. Gidiotis, E. Chatzikyriakidis and G. Tsoumakas. 2021.
|
122 |
-
> Towards Human-Centered Summarization: A Case Study on Financial News.
|
123 |
-
> In Proceedings of the First Workshop on Bridging Human-Computer Interaction and Natural Language Processing(pp. 21–27). Association for Computational Linguistics.
|
124 |
-
|
125 |
-
BibTeX entry:
|
126 |
-
|
127 |
-
```
|
128 |
-
@inproceedings{passali-etal-2021-towards,
|
129 |
-
title = "Towards Human-Centered Summarization: A Case Study on Financial News",
|
130 |
-
author = "Passali, Tatiana and Gidiotis, Alexios and Chatzikyriakidis, Efstathios and Tsoumakas, Grigorios",
|
131 |
-
booktitle = "Proceedings of the First Workshop on Bridging Human{--}Computer Interaction and Natural Language Processing",
|
132 |
-
month = apr,
|
133 |
-
year = "2021",
|
134 |
-
address = "Online",
|
135 |
-
publisher = "Association for Computational Linguistics",
|
136 |
-
url = "https://www.aclweb.org/anthology/2021.hcinlp-1.4",
|
137 |
-
pages = "21--27",
|
138 |
-
}
|
139 |
-
```
|
140 |
-
|
141 |
-
## Support
|
142 |
-
|
143 |
-
Contact us at [[email protected]](mailto:[email protected]) if you are interested in a more sophisticated version of the model, trained on more articles and adapted to your needs!
|
144 |
-
|
145 |
-
More information about Medoid AI:
|
146 |
-
- Website: [https://www.medoid.ai](https://www.medoid.ai)
|
147 |
-
- LinkedIn: [https://www.linkedin.com/company/medoid-ai/](https://www.linkedin.com/company/medoid-ai/)
|
148 |
-
|
149 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|