YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
π SecureBERT Phishing Detection Model
This repository hosts a fine-tuned SecureBERT-based model optimized for phishing URL detection using a cybersecurity dataset. The model classifies URLs as either phishing (malicious) or safe (benign).
π Model Details
- Model Architecture: SecureBERT (Based on BERT)
- Task: Binary Classification (Phishing vs. Safe)
- Dataset: shashwatwork/web-page-phishing-detection-dataset (11,431 URLs, 88 features)
- Framework: PyTorch & Hugging Face Transformers
- Input Data: URL strings & extracted numerical features
- Number of Classes: 2 (Phishing, Safe)
- Quantization: FP16 (for efficiency)
π Usage
Installation
pip install torch transformers scikit-learn pandas
Loading the Model
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load the fine-tuned model and tokenizer
model_path = "./fine_tuned_SecureBERT"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)
model.eval() # Set model to evaluation mode
print("β
SecureBERT model loaded successfully and ready for inference!")
π Perform Phishing Detection
def predict_url(url):
# Tokenize input
encoding = tokenizer(url, truncation=True, padding=True, max_length=512, return_tensors="pt")
# Perform inference
with torch.no_grad():
output = model(**encoding)
# Get predicted class
predicted_class = torch.argmax(output.logits, dim=1).item()
# Map label
label = "Phishing" if predicted_class == 1 else "Safe"
return label
# Example usage
custom_url = "http://example.com/free-gift"
prediction = predict_url(custom_url)
print(f"Predicted label: {prediction}")
π Evaluation Results
After fine-tuning, the model was evaluated on a test set, achieving the following performance:
Metric | Score |
---|---|
Accuracy | 97.2% |
Precision | 96.8% |
Recall | 97.5% |
F1-Score | 97.1% |
Inference Speed | Fast (Optimized with FP16) |
π οΈ Fine-Tuning Details
Dataset
The model was trained on a shashwatwork/web-page-phishing-detection-dataset consisting of 11,431 URLs labeled as either phishing or safe. Features include URL characteristics, domain properties, and additional metadata.
Training Configuration
- Number of epochs: 5
- Batch size: 16
- Optimizer: AdamW
- Learning rate: 2e-5
- Loss Function: Cross-Entropy
- Evaluation Strategy: Validation at each epoch
Quantization
The model was quantized using FP16 precision, reducing latency and memory usage while maintaining high accuracy.
β οΈ Limitations
- Evasion Techniques: Attackers constantly evolve phishing techniques, which may reduce model effectiveness.
- Dataset Bias: The model was trained on a specific dataset; new phishing tactics may require retraining.
- False Positives: Some legitimate but unusual URLs might be classified as phishing.
β Use this fine-tuned SecureBERT model for accurate and efficient phishing detection! ππ
- Downloads last month
- 205
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.