|
--- |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- cross-encoder |
|
- sentence-similarity |
|
- transformers |
|
- legal |
|
- reranker |
|
library_name: generic |
|
language: |
|
- vi |
|
--- |
|
|
|
# NaverHustQA/viLegal_cross_Quang |
|
|
|
This is an cross-encoder model for Vietnamese legal domain: It returns a relevance score of a query-context input and can be used for information retrieval. |
|
|
|
We use [vinai/phobert-base-v2](https://huggingface.co/vinai/phobert-base-v2) as the pre-trained backbone. |
|
|
|
|
|
<!--- Describe your model here --> |
|
|
|
## Usage (HuggingFace Transformers) |
|
|
|
You can use the model like below (Remember to word-segment inputs first): |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
import torch |
|
|
|
# Load cross-encoder |
|
model_name = "NaverHustQA/viLegal_cross_Quang" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
# Define query and context |
|
query = "'Uống rượu lái_xe bị phạt bao_nhiêu tiền ?'" |
|
context = "Uống rượu lái_xe bị phạt 500,000 đồng ." |
|
|
|
# Tokenize input (Cross-encoder format: query and context as a single input) |
|
inputs = tokenizer(query, context, return_tensors="pt", padding=True, truncation=True) |
|
|
|
# Run through model |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
score = outputs.logits.item() # Extract classification score |
|
|
|
print(f"Relevance Score: {score}") |
|
``` |
|
## Training |
|
You can find full information of our training methods and datasets in our reports. |
|
|
|
## Authors |
|
Le Thanh Huong, Nguyen Nhat Quang. |