NaverHustQA/viLegal_cross_Quang

This is an cross-encoder model for Vietnamese legal domain: It returns a relevance score of a query-context input and can be used for information retrieval.

We use vinai/phobert-base-v2 as the pre-trained backbone.

Usage (HuggingFace Transformers)

You can use the model like below (Remember to word-segment inputs first):

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load cross-encoder
model_name = "NaverHustQA/viLegal_cross_Quang"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Define query and context
query = "'Uống rượu lái_xe bị phạt bao_nhiêu tiền ?'"
context = "Uống rượu lái_xe bị phạt 500,000 đồng ."

# Tokenize input (Cross-encoder format: query and context as a single input)
inputs = tokenizer(query, context, return_tensors="pt", padding=True, truncation=True)

# Run through model
with torch.no_grad():
    outputs = model(**inputs)
    score = outputs.logits.item()  # Extract classification score

print(f"Relevance Score: {score}")

Training

You can find full information of our training methods and datasets in our reports.

Authors

Le Thanh Huong, Nguyen Nhat Quang.

Downloads last month
4
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support sentence-similarity models for generic library.