jagan-raj commited on
Commit
0daa27d
·
verified ·
1 Parent(s): 5bba431

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md CHANGED
@@ -4,7 +4,15 @@ base_model:
4
  - google-bert/bert-base-uncased
5
  datasets:
6
  - zefang-liu/phishing-email-dataset
 
 
 
 
 
 
7
  ---
 
 
8
  # PhishMail - BERT Model for Phishing Detection
9
 
10
  This repository features a fine-tuned BERT model designed to detect phishing emails.
@@ -31,4 +39,54 @@ The model is trained to classify emails as either phishing or legitimate by anal
31
  ```bash
32
  !pip install transformers torch
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  ```
 
4
  - google-bert/bert-base-uncased
5
  datasets:
6
  - zefang-liu/phishing-email-dataset
7
+ language:
8
+ - en
9
+ metrics:
10
+ - accuracy
11
+ tags:
12
+ - security
13
  ---
14
+
15
+
16
  # PhishMail - BERT Model for Phishing Detection
17
 
18
  This repository features a fine-tuned BERT model designed to detect phishing emails.
 
39
  ```bash
40
  !pip install transformers torch
41
 
42
+ ```
43
+
44
+ **Step 2:** Loading the Model:
45
+
46
+ ```bash
47
+ from transformers import BertForSequenceClassification, BertTokenizer
48
+ import torch
49
+
50
+ # Specify the Hugging Face model repository name
51
+ model_name = 'jagan-raj/PhishMail'
52
+
53
+ # Load the fine-tuned BERT model for phishing detection
54
+ model = BertForSequenceClassification.from_pretrained(model_name)
55
+
56
+ # Load the corresponding tokenizer for the fine-tuned model
57
+ tokenizer = BertTokenizer.from_pretrained(model_name)
58
+
59
+ # Set the model to evaluation mode for inference
60
+ model.eval()
61
+
62
+ ```
63
+
64
+ **Step 3:** Using the Model for Predictions:
65
+
66
+ ```bash
67
+ # Input the email text for classification
68
+ email_text = "Your email content here"
69
+
70
+ # Tokenize and preprocess the input text
71
+ # Converts the email text into token IDs, applies truncation/padding, and creates a tensor
72
+ inputs = tokenizer(
73
+ email_text,
74
+ return_tensors="pt", # Output tensors in PyTorch format
75
+ truncation=True, # Truncate the text if it exceeds the max_length
76
+ padding='max_length' # Pad the text to the maximum sequence length
77
+ )
78
+
79
+ # Make a prediction using the model
80
+ with torch.no_grad(): # Disable gradient calculations for faster inference
81
+ outputs = model(**inputs) # Get model outputs
82
+ logits = outputs.logits # Extract raw prediction scores (logits)
83
+ predictions = torch.argmax(logits, dim=-1) # Determine the predicted class (0 or 1)
84
+
85
+ # Interpret the prediction result
86
+ # Map the prediction to its corresponding label: 1 for "Phishing", 0 for "Legitimate"
87
+ result = "This is a phishing email." if predictions.item() == 1 else "This is a legitimate email."
88
+
89
+ # Print the prediction result
90
+ print(f"Prediction: {result}")
91
+
92
  ```