--- pipeline_tag: sentence-similarity tags: - cross-encoder - sentence-similarity - transformers - legal - reranker library_name: generic language: - vi --- # NaverHustQA/viLegal_cross_Quang This is an cross-encoder model for Vietnamese legal domain: It returns a relevance score of a query-context input and can be used for information retrieval. We use [vinai/phobert-base-v2](https://huggingface.co/vinai/phobert-base-v2) as the pre-trained backbone. ## Usage (HuggingFace Transformers) You can use the model like below (Remember to word-segment inputs first): ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch # Load cross-encoder model_name = "NaverHustQA/viLegal_cross_Quang" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Define query and context query = "'Uống rượu lái_xe bị phạt bao_nhiêu tiền ?'" context = "Uống rượu lái_xe bị phạt 500,000 đồng ." # Tokenize input (Cross-encoder format: query and context as a single input) inputs = tokenizer(query, context, return_tensors="pt", padding=True, truncation=True) # Run through model with torch.no_grad(): outputs = model(**inputs) score = outputs.logits.item() # Extract classification score print(f"Relevance Score: {score}") ``` ## Training You can find full information of our training methods and datasets in our reports. ## Authors Le Thanh Huong, Nguyen Nhat Quang.