Text Classification
Transformers
PyTorch
Bulgarian
bert
torch

BERT BASE (cased) finetuned on Bulgarian natural-language-inference data

Pretrained model on Bulgarian language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is cased: it does make a difference between bulgarian and Bulgarian. The training data is Bulgarian text from OSCAR, Chitanka and Wikipedia.

It was finetuned on private NLI Bulgarian data.

Then, it was compressed via progressive module replacing.

How to use

Here is how to use this model in PyTorch:

>>> import torch
>>> from transformers import AutoModelForSequenceClassification, AutoTokenizer
>>> 
>>> model_id = 'rmihaylov/bert-base-nli-theseus-bg'
>>> model = AutoModelForSequenceClassification.from_pretrained(model_id)
>>> tokenizer = AutoTokenizer.from_pretrained(model_id)
>>>
>>> inputs = tokenizer.encode_plus(
>>>     'Няколко момчета играят футбол.', 
>>>     'Няколко момичета играят футбол.', 
>>>     return_tensors='pt')
>>>
>>> outputs = model(**inputs)
>>> contradiction, entailment, neutral = torch.softmax(outputs[0][0], dim=0).detach()
>>> contradiction, neutral, entailment

(tensor(0.9998), tensor(0.0001), tensor(5.9929e-05))
Downloads last month
111
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Datasets used to train rmihaylov/bert-base-nli-theseus-bg