YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

πŸ”’ SecureBERT Phishing Detection Model

This repository hosts a fine-tuned SecureBERT-based model optimized for phishing URL detection using a cybersecurity dataset. The model classifies URLs as either phishing (malicious) or safe (benign).


πŸ“š Model Details

  • Model Architecture: SecureBERT (Based on BERT)
  • Task: Binary Classification (Phishing vs. Safe)
  • Dataset: shashwatwork/web-page-phishing-detection-dataset (11,431 URLs, 88 features)
  • Framework: PyTorch & Hugging Face Transformers
  • Input Data: URL strings & extracted numerical features
  • Number of Classes: 2 (Phishing, Safe)
  • Quantization: FP16 (for efficiency)

πŸš€ Usage

Installation

pip install torch transformers scikit-learn pandas

Loading the Model

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the fine-tuned model and tokenizer
model_path = "./fine_tuned_SecureBERT"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)
model.eval()  # Set model to evaluation mode

print("βœ… SecureBERT model loaded successfully and ready for inference!")

πŸ” Perform Phishing Detection

def predict_url(url):
    # Tokenize input
    encoding = tokenizer(url, truncation=True, padding=True, max_length=512, return_tensors="pt")
    
    # Perform inference
    with torch.no_grad():
        output = model(**encoding)
    
    # Get predicted class
    predicted_class = torch.argmax(output.logits, dim=1).item()
    
    # Map label
    label = "Phishing" if predicted_class == 1 else "Safe"
    return label

# Example usage
custom_url = "http://example.com/free-gift"
prediction = predict_url(custom_url)
print(f"Predicted label: {prediction}")

πŸ“Š Evaluation Results

After fine-tuning, the model was evaluated on a test set, achieving the following performance:

Metric Score
Accuracy 97.2%
Precision 96.8%
Recall 97.5%
F1-Score 97.1%
Inference Speed Fast (Optimized with FP16)

πŸ› οΈ Fine-Tuning Details

Dataset

The model was trained on a shashwatwork/web-page-phishing-detection-dataset consisting of 11,431 URLs labeled as either phishing or safe. Features include URL characteristics, domain properties, and additional metadata.

Training Configuration

  • Number of epochs: 5
  • Batch size: 16
  • Optimizer: AdamW
  • Learning rate: 2e-5
  • Loss Function: Cross-Entropy
  • Evaluation Strategy: Validation at each epoch

Quantization

The model was quantized using FP16 precision, reducing latency and memory usage while maintaining high accuracy.


⚠️ Limitations

  • Evasion Techniques: Attackers constantly evolve phishing techniques, which may reduce model effectiveness.
  • Dataset Bias: The model was trained on a specific dataset; new phishing tactics may require retraining.
  • False Positives: Some legitimate but unusual URLs might be classified as phishing.

βœ… Use this fine-tuned SecureBERT model for accurate and efficient phishing detection! πŸ”’πŸš€

Downloads last month
205
Safetensors
Model size
125M params
Tensor type
FP16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.