|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
- precision |
|
- recall |
|
- f1 |
|
base_model: |
|
- google-bert/bert-base-uncased |
|
pipeline_tag: text-classification |
|
library_name: transformers |
|
tags: |
|
- sequence-classification |
|
- glue |
|
- mrpc |
|
- bert |
|
- transformers |
|
--- |
|
|
|
# BERT Paraphrase Detection (GLUE MRPC) |
|
|
|
This model is fine-tuned for the **paraphrase detection** task on the GLUE MRPC dataset. It determines whether two given sentences are paraphrases (i.e., if they have the same meaning or not). This is a binary classification task with the following labels: |
|
|
|
- **1**: Paraphrase |
|
- **0**: Not a paraphrase |
|
|
|
## Model Overview |
|
|
|
- **Developer**: Parit Kasnal |
|
- **Model Type**: Sequence Classification (Binary) |
|
- **Language(s)**: English |
|
- **Pre-trained Model**: BERT (bert-base-uncased) |
|
|
|
## Intended Use |
|
|
|
This model is designed to assess whether two sentences convey the same meaning. It can be applied in various scenarios, including: |
|
|
|
- **Duplicate Question Detection**: Identifying similar questions in QA systems. |
|
- **Plagiarism Detection**: Detecting if content is copied and rephrased. |
|
- **Summarization Alignment**: Matching sentences from summaries to the original content. |
|
|
|
## Example Usage |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
import torch |
|
|
|
# Load the fine-tuned model and tokenizer |
|
model = AutoModelForSequenceClassification.from_pretrained("Parit1/dummy") |
|
tokenizer = AutoTokenizer.from_pretrained("Parit1/dummy") |
|
|
|
def make_prediction(text1, text2): |
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
inputs = tokenizer(text1, text2, truncation=True, padding=True, return_tensors="pt") |
|
inputs = {k: v.to(device) for k, v in inputs.items()} |
|
model.to(device) |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
logits = outputs.logits |
|
prediction = torch.argmax(logits, dim=-1).item() |
|
return prediction |
|
|
|
# Example usage |
|
text1 = "The quick brown fox jumps over the lazy dog." |
|
text2 = "A fast brown fox leaps over a lazy dog." |
|
prediction = make_prediction(text1, text2) |
|
print(f"Prediction: {prediction}") |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
The model was fine-tuned on the **GLUE MRPC** dataset, which contains pairs of sentences labeled as either paraphrases or not. |
|
|
|
### Training Procedure |
|
- **Number of Epochs**: 2 |
|
- **Metrics Used**: |
|
- Accuracy |
|
- Precision |
|
- Recall |
|
- F1 Score |
|
|
|
#### Training Logs (Summary) |
|
| **Epoch** | **Avg Loss** | **Accuracy** | **Precision** | **Recall** | **F1 Score** | |
|
|-----------|--------------|--------------|---------------|------------|--------------| |
|
| **1** | 0.5443 | 73.45% | 72.28% | 73.45% | 70.83% | |
|
| **2** | 0.2756 | 89.34% | 89.25% | 89.34% | 89.27% | |
|
|
|
## Evaluation |
|
|
|
### Performance Metrics |
|
The model's performance was evaluated using the following metrics: |
|
|
|
- **Accuracy**: Percentage of correct predictions. |
|
- **Precision**: Proportion of positive identifications that were actually correct. |
|
- **Recall**: Proportion of actual positives that were correctly identified. |
|
- **F1 Score**: The harmonic mean of Precision and Recall. |
|
|
|
### Test Set Results |
|
| **Epoch** | **Avg Loss** | **Accuracy** | **Precision** | **Recall** | **F1 Score** | |
|
|-----------|--------------|--------------|---------------|------------|--------------| |
|
| **1** | 0.3976 | 82.60% | 82.26% | 82.60% | 81.93% | |
|
| **2** | 0.3596 | 84.80% | 84.94% | 84.80% | 84.87% | |
|
|