Overview
BEM - BERT Matching model from paper Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation (reproduction).
It is a bert-base-uncased model trained on the Answer Equivalence dataset
Consider this example (pseudocode):
question = 'how is the weather in california'
reference answer = 'infrequent rain'
candidate answer = 'rain'
bem(question, reference, candidate) ~ 0
This model can be used as a metric to evaluate automatic question answering systems: when the produced answer is different from the reference, it might still be equivalent to the reference and hence count as correct.
See the paper Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation for a detailed explanation of how the data was collected and how this metric compares to others such as exact match of F1.
Example use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from torch.nn import functional as F
tokenizer = AutoTokenizer.from_pretrained("kortukov/answer-equivalence-bem")
model = AutoModelForSequenceClassification.from_pretrained("kortukov/answer-equivalence-bem")
question = "What does Ban Bossy encourage?"
reference = "leadership in girls"
candidate = "positions of power"
def tokenize_function(question, reference, candidate):
text = f"[CLS] {candidate} [SEP]"
text_pair = f"{reference} [SEP] {question} [SEP]"
return tokenizer(text=text, text_pair=text_pair, add_special_tokens=False, padding='max_length', truncation=True, return_tensors='pt')
inputs = tokenize_function(question, reference, candidate)
out = model(**inputs)
prediction = F.softmax(out.logits, dim=-1).argmax().item()
- Downloads last month
- 393
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.