Spaces:
Running
Running
return only harim+ scores
Browse files- README.md +3 -4
- harim_plus.py +13 -17
README.md
CHANGED
@@ -4,7 +4,7 @@ emoji: 🤗
|
|
4 |
colorFrom: blue
|
5 |
colorTo: red
|
6 |
sdk: gradio
|
7 |
-
sdk_version: 3.
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
tags:
|
@@ -34,7 +34,6 @@ pip install evaluate
|
|
34 |
import evaluate
|
35 |
from pprint import pprint
|
36 |
|
37 |
-
# example from the paper
|
38 |
art = """Spain's 2-0 defeat by Holland on Tuesday brought back bitter memories of their disastrous 2014 World Cup, but coach Vicente del Bosque will not be too worried about a third straight friendly defeat, insists Gerard Pique. Holland, whose 5-1 drubbing of Spain in the group stage in Brazil last year marked the end of the Iberian nation's six-year domination of the world game, scored two early goals at the Amsterdam Arena and held on against some determined Spain pressure in the second half for a 2-0 success. They became the first team to inflict two defeats on Del Bosque since he took over in 2008 but the gruff 64-year-old had used the match to try out several new faces and he fielded a largely experimental, second-string team. Stefan de Vrij (right) headed Holland in front against Spain at the Amsterdam Arena on Tuesday Gerard Pique (left) could do nothing to stop Davy Klaassen doubling the Dutch advantage Malaga forward Juanmi and Sevilla midfielder Vitolo became the 55th and 56th players to debut under Del Bosque, while the likes of goalkeeper David de Gea, defenders Raul Albiol, Juan Bernat and Dani Carvajal and midfielder Mario Suarez all started the game. 'The national team's state of health is good,' centre back Gerard Pique told reporters. 'We are in a process where players are coming into the team and gathering experience,' added the Barcelona defender. 'We are second in qualifying (for Euro 2016) and these friendly games are for experimenting. 'I am not that worried about this match because we lost friendlies in previous years and then ended up winning titles.' David de Gea was given a start by Vicente del Bosque but could not keep out De Vrij's header here Dani Carvajal (centre) was another squad player given a chance to impress against Holland Del Bosque will be confident he can find the right mix of players to secure Spain's berth at Euro 2016 in France next year, when they will be chasing an unprecedented third straight title. Slovakia are the surprise leaders in qualifying Group C thanks to a 2-1 win over Spain in Zilina in October and have a maximum 15 points from five of 10 matches. Spain are second on 12 points, three ahead of Ukraine, who they beat 1-0 in Seville on Friday. Del Bosque's side host Slovakia in September in a match that could decide who goes through to the finals as group winners. 'The team is in good shape,' forward Pedro told reporters. 'We have a very clear idea of our playing style and we are able to count on people who are gradually making a place for themselves in the team.'"""
|
39 |
|
40 |
summaries = [
|
@@ -46,8 +45,8 @@ summaries = [
|
|
46 |
articles = [art] * len(summaries)
|
47 |
|
48 |
scorer = evaluate.load('NCSOFT/harim_plus')
|
49 |
-
scores = scorer.compute(predictions = summaries, references = articles) # use_aggregator=False,
|
50 |
-
pprint(scores
|
51 |
>>> [1.8230078220367432,
|
52 |
1.5361897945404053,
|
53 |
1.806436538696289,
|
|
|
4 |
colorFrom: blue
|
5 |
colorTo: red
|
6 |
sdk: gradio
|
7 |
+
sdk_version: 3.0.2
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
tags:
|
|
|
34 |
import evaluate
|
35 |
from pprint import pprint
|
36 |
|
|
|
37 |
art = """Spain's 2-0 defeat by Holland on Tuesday brought back bitter memories of their disastrous 2014 World Cup, but coach Vicente del Bosque will not be too worried about a third straight friendly defeat, insists Gerard Pique. Holland, whose 5-1 drubbing of Spain in the group stage in Brazil last year marked the end of the Iberian nation's six-year domination of the world game, scored two early goals at the Amsterdam Arena and held on against some determined Spain pressure in the second half for a 2-0 success. They became the first team to inflict two defeats on Del Bosque since he took over in 2008 but the gruff 64-year-old had used the match to try out several new faces and he fielded a largely experimental, second-string team. Stefan de Vrij (right) headed Holland in front against Spain at the Amsterdam Arena on Tuesday Gerard Pique (left) could do nothing to stop Davy Klaassen doubling the Dutch advantage Malaga forward Juanmi and Sevilla midfielder Vitolo became the 55th and 56th players to debut under Del Bosque, while the likes of goalkeeper David de Gea, defenders Raul Albiol, Juan Bernat and Dani Carvajal and midfielder Mario Suarez all started the game. 'The national team's state of health is good,' centre back Gerard Pique told reporters. 'We are in a process where players are coming into the team and gathering experience,' added the Barcelona defender. 'We are second in qualifying (for Euro 2016) and these friendly games are for experimenting. 'I am not that worried about this match because we lost friendlies in previous years and then ended up winning titles.' David de Gea was given a start by Vicente del Bosque but could not keep out De Vrij's header here Dani Carvajal (centre) was another squad player given a chance to impress against Holland Del Bosque will be confident he can find the right mix of players to secure Spain's berth at Euro 2016 in France next year, when they will be chasing an unprecedented third straight title. Slovakia are the surprise leaders in qualifying Group C thanks to a 2-1 win over Spain in Zilina in October and have a maximum 15 points from five of 10 matches. Spain are second on 12 points, three ahead of Ukraine, who they beat 1-0 in Seville on Friday. Del Bosque's side host Slovakia in September in a match that could decide who goes through to the finals as group winners. 'The team is in good shape,' forward Pedro told reporters. 'We have a very clear idea of our playing style and we are able to count on people who are gradually making a place for themselves in the team.'"""
|
38 |
|
39 |
summaries = [
|
|
|
45 |
articles = [art] * len(summaries)
|
46 |
|
47 |
scorer = evaluate.load('NCSOFT/harim_plus')
|
48 |
+
scores = scorer.compute(predictions = summaries, references = articles) # use_aggregator=False, bsz=32, return_details=False, tokenwise_score=False)
|
49 |
+
pprint(scores)
|
50 |
>>> [1.8230078220367432,
|
51 |
1.5361897945404053,
|
52 |
1.806436538696289,
|
harim_plus.py
CHANGED
@@ -30,14 +30,11 @@ _CITATION = """\
|
|
30 |
}
|
31 |
"""
|
32 |
|
33 |
-
_DESCRIPTION = """
|
34 |
-
|
35 |
-
It will work great ranking the summary-article pairs according to its quality.
|
36 |
-
Note that the score range is unbound.
|
37 |
|
38 |
-
|
39 |
-
|
40 |
-
HaRiM+ is proved effective for benchmarking summarization systems (system-level performance) as well as ranking the article-summary pairs (segment-level performance) in comprehensive aspect such as factuality, consistency, coherency, fluency, and relevance. For details, refer to our paper published in AACL2022.
|
41 |
"""
|
42 |
|
43 |
_KWARGS_DESCRIPTION = """
|
@@ -51,14 +48,12 @@ Args:
|
|
51 |
`predictions` (list of str): generated summaries
|
52 |
`references` (list of str): source articles to be summarized
|
53 |
`use_aggregator` (bool): if True, average of the scores are returned
|
|
|
|
|
|
|
54 |
|
55 |
Returns:
|
56 |
-
'results' (
|
57 |
-
'harim+' (List[float] or float): HaRiM+ score to use,
|
58 |
-
'harim' (List[float] or float): HaRiM term for computing the score above,
|
59 |
-
'log_ppl' (List[float] or float): Log perplexity term. Same as (Yuan et al., NeurIPS 2021),
|
60 |
-
'lambda' (float): (recommend not to modify this) Balancing coeff. for computing harim+ from harim and log_ppl.
|
61 |
-
}
|
62 |
|
63 |
Examples:
|
64 |
>>> summaries = ["hello there", "hello there"]
|
@@ -94,8 +89,8 @@ class Harimplus(evaluate.Metric):
|
|
94 |
inputs_description=_KWARGS_DESCRIPTION,
|
95 |
features=datasets.Features(
|
96 |
{
|
97 |
-
"predictions": datasets.Value("string", id="sequence"),
|
98 |
-
"references": datasets.Value("string", id="sequence"),
|
99 |
}
|
100 |
),
|
101 |
codebase_urls=[CODEBASE_URL],
|
@@ -124,8 +119,9 @@ class Harimplus(evaluate.Metric):
|
|
124 |
references=None,
|
125 |
use_aggregator=False,
|
126 |
bsz=32,
|
127 |
-
tokenwise_score=False
|
|
|
128 |
summaries = predictions
|
129 |
articles = references
|
130 |
-
scores = self.scorer.compute(predictions=summaries, references=articles, use_aggregator=use_aggregator, bsz=bsz, tokenwise_score=tokenwise_score)
|
131 |
return scores
|
|
|
30 |
}
|
31 |
"""
|
32 |
|
33 |
+
_DESCRIPTION = f"""HaRiM+ is a reference-less evaluation metric (i.e. requires only article-summary pair, no reference summary) for summarization which hurls the power of summarization model.
|
34 |
+
Summarization model inside the HaRiM+ will read and evaluate how good the quality of a summary given the paired article.
|
35 |
+
It will work great for ranking the summary-article pairs according to its quality.
|
|
|
36 |
|
37 |
+
HaRiM+ is proved effective for benchmarking summarization systems (system-level performance) as well as ranking the article-summary pairs (segment-level performance) in comprehensive aspect such as factuality, consistency, coherency, fluency, and relevance. For details, refer to our [paper]({PAPER_URL}) published in AACL2022.
|
|
|
|
|
38 |
"""
|
39 |
|
40 |
_KWARGS_DESCRIPTION = """
|
|
|
48 |
`predictions` (list of str): generated summaries
|
49 |
`references` (list of str): source articles to be summarized
|
50 |
`use_aggregator` (bool): if True, average of the scores are returned
|
51 |
+
`bsz` (int): batch size for harim to iterate through the given pairs
|
52 |
+
`return_details` (bool): whether to show more than harim+ score (returns logppl, harim term. refer to the paper for detail)
|
53 |
+
`tokenwise_score` (bool): whether to show tokenwise scores for input pairs (if return_details=False, this is ignored)
|
54 |
|
55 |
Returns:
|
56 |
+
'results' (list of float): harim+ score for each summary-article pair
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
Examples:
|
59 |
>>> summaries = ["hello there", "hello there"]
|
|
|
89 |
inputs_description=_KWARGS_DESCRIPTION,
|
90 |
features=datasets.Features(
|
91 |
{
|
92 |
+
"predictions (summaries)": datasets.Value("string", id="sequence"),
|
93 |
+
"references (articles)": datasets.Value("string", id="sequence"),
|
94 |
}
|
95 |
),
|
96 |
codebase_urls=[CODEBASE_URL],
|
|
|
119 |
references=None,
|
120 |
use_aggregator=False,
|
121 |
bsz=32,
|
122 |
+
tokenwise_score=False,
|
123 |
+
return_details=False):
|
124 |
summaries = predictions
|
125 |
articles = references
|
126 |
+
scores = self.scorer.compute(predictions=summaries, references=articles, use_aggregator=use_aggregator, bsz=bsz, tokenwise_score=tokenwise_score, return_details=return_details)
|
127 |
return scores
|