Overview

BEM - BERT Matching model from paper Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation (reproduction).

It is a bert-base-uncased model trained on the Answer Equivalence dataset

Consider this example (pseudocode):

question = 'how is the weather in california'
reference answer = 'infrequent rain'
candidate answer = 'rain'
bem(question, reference, candidate) ~ 0

This model can be used as a metric to evaluate automatic question answering systems: when the produced answer is different from the reference, it might still be equivalent to the reference and hence count as correct.

See the paper Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation for a detailed explanation of how the data was collected and how this metric compares to others such as exact match of F1.

Example use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from torch.nn import functional as F

tokenizer = AutoTokenizer.from_pretrained("kortukov/answer-equivalence-bem")
model = AutoModelForSequenceClassification.from_pretrained("kortukov/answer-equivalence-bem")

question = "What does Ban Bossy encourage?"
reference = "leadership in girls"
candidate = "positions of power"

def tokenize_function(question, reference, candidate):
    text = f"[CLS] {candidate} [SEP]"
    text_pair = f"{reference} [SEP] {question} [SEP]"
    return tokenizer(text=text, text_pair=text_pair, add_special_tokens=False, padding='max_length', truncation=True, return_tensors='pt')

inputs = tokenize_function(question, reference, candidate)
out = model(**inputs)

prediction = F.softmax(out.logits, dim=-1).argmax().item()

kortukov
/

answer-equivalence-bem

Overview

Example use

Dataset used to train kortukov/answer-equivalence-bem