BERT base model for pair ranking (reward model for RLHF) in Russian language.

For training i use the next pair-ranking-loss Model based on ruBert-base

Datasets have been translated with google-translate-api for reward training:

Firstly download custom model localy. You can do it manualy.

OR:

OR look at this manual

Usage (HuggingFace Models Repository)

You can use the model directly from the model repository to compute score:

#Use custom model class:

import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel, AdamW, BertModel

class RewardModel(nn.Module):
        def __init__(self, model_name):
            super(RewardModel, self).__init__()
            self.checkpoint = model_name
            self.bert = AutoModel.from_pretrained(model_name,
                                                  return_dict=False)
            self.layer_norm = nn.LayerNorm(768)
            self.dropout = nn.Dropout(0.3)
            self.dense = nn.Sequential(
                nn.Linear(768, 512),
                nn.LeakyReLU(negative_slope=0.01),
                nn.Dropout(0.3),
                nn.Linear(512, 1),
                nn.Sigmoid()
            )

        def forward(self, input_ids, token_type_ids, attention_mask):
            
            model_output = self.bert(input_ids=input_ids, 
                                     token_type_ids = token_type_ids,
                                     attention_mask=attention_mask)
            
            last_hidden_states = model_output[0]
            pooled_output = last_hidden_states[:,0]
            pooled_output = self.layer_norm(pooled_output)
            pooled_output = self.dropout(pooled_output)
            preds = self.dense(pooled_output)
            return preds


#Create model object and init pretrain weights:
reward_name = "ai-forever/ruBert-base"
tokenizer=AutoTokenizer.from_pretrained(reward_name)
model = RewardModel(reward_name)
model.load_state_dict(torch.load('./ruBert-base-reward/pytorch_model.bin'))
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

#Sentences that we want to score:
sentences =  ['Человек: Что такое QR-код?', 'Ассистент: QR-код - это тип матричного штрих-кода.']

#Compute reward score:
with torch.no_grad():
    model.to(device)
    
    encoded_input = tokenizer(sentences[0],sentences[1],
                                        truncation=True,
                                        add_special_tokens=True,
                                        max_length=512,
                                        padding='max_length',
                                        return_tensors='pt')
    
    encoded_input = encoded_input.to(device)
    score = model(**encoded_input).cpu().flatten().numpy()
    print(score)

Authors

Downloads last month
172
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.