pirocheto
/

phishing-url-detection

Text Classification

Scikit-learn

ONNX

phishing

url

Model card Files Files and versions Community

GitHub Action commited on Nov 18, 2023

Commit

d28bf1d

1 Parent(s): 3bf1ffc

commit from github

Browse files

Files changed (3) hide show

README.md +82 -12
model.onnx +2 -2
model.pkl +2 -2

README.md CHANGED Viewed

@@ -58,9 +58,18 @@ inference: false
 pipeline_tag: tabular-classification
 ---
-# Model description
-## Evaluation Results
 | Metric    |    Value |
 |-----------|----------|
@@ -69,22 +78,21 @@ pipeline_tag: tabular-classification
 | precision | 0.951996 |
 | recall    | 0.938331 |
-# Model Description
-The model predicts the probability that a URL is a phishing site using a list of features.
-- **Developed by:** [pirocheto](https://github.com/pirocheto)
-- **Model type:** Traditional machine learning
-- **Task:** Tabular classification (Binary)
-- **License:** {{ license }}
-- **Repository:** {{ repo }}
-# How to Get Started with the Model
 ## With ONNX (recommanded)
 ```python
 import onnxruntime
 import pandas as pd
@@ -129,9 +137,71 @@ for url, proba in zip(data, probas):
     print(f"Likelihood of being a phishing site: {proba[1] * 100:.2f}%")
     print("----")
-# Output:
 # URL: https://www.rga.com/about/workplace
 # Likelihood of being a phishing site: 0.89%
 # ----
 ```

 pipeline_tag: tabular-classification
 ---
+# Model Description
+The model predicts the probability that a URL is a phishing site using a list of features.
+- **Model type:** Traditional machine learning
+- **Task:** Tabular classification (Binary)
+- **License:**: MIT
+- **Repository:** https://github.com/pirocheto/phishing-url-detection
+## Evaluation
 | Metric    |    Value |
 |-----------|----------|
 | precision | 0.951996 |
 | recall    | 0.938331 |
+# How to Get Started with the Model
+Using pickle in Python is discouraged due to security risks during data deserialization, potentially allowing code injection.
+It lacks portability across Python versions and interoperability with other languages.
+Instead, we recommend using the ONNX model, which is more secure.
+It is half the size and almost twice as fast compared to the pickle model.
+Additionally, it can be utilized by languages supported by the [ONNX runtime](https://onnxruntime.ai/docs/get-started/) (see below for an example using JavaScript).
 ## With ONNX (recommanded)
+### Python
 ```python
 import onnxruntime
 import pandas as pd
     print(f"Likelihood of being a phishing site: {proba[1] * 100:.2f}%")
     print("----")
+# Expected output:
 # URL: https://www.rga.com/about/workplace
 # Likelihood of being a phishing site: 0.89%
 # ----
 ```
+### JavaScript
+```javascript
+const ort = require('onnxruntime-node');
+const data = [
+    {
+        "url": "http://rapidpaws.com/wp-content/we_transfer/index2.php?email=/",
+        "nb_hyperlinks": 1,
+        "ratio_intHyperlinks": 1,
+        "ratio_extHyperlinks": 0,
+        "ratio_extRedirection": 0,
+        "safe_anchor": 0,
+        "domain_registration_length": 338,
+        "domain_age": 0,
+        "web_traffic":1853,
+        "google_index": 1,
+        "page_rank": 2,
+    },
+];
+async function main() {
+    try {
+        // Make sure you have downloaded the model.onnx
+        // Creating an ONNX inference session with the specified model
+        const model_path = "./models/model.onnx";
+        const session = await ort.InferenceSession.create(model_path);
+        // Creating an ONNX tensor from the input data
+        const inputs = data.map(url => Object.values(url).slice(1));
+        const flattenInputs = inputs.flat();
+        const tensor = new ort.Tensor('float32', flattenInputs, [inputs.length, 10]);
+        // Executing the inference session with the input tensor
+        const results = await session.run({"X": tensor});
+        const probas = results['probabilities'].data;
+        // Displaying results for each URL
+        data.forEach((url, index) => {
+            const proba = probas[index * 2 + 1];
+            const percent = (proba * 100).toFixed(2);
+            console.log(`URL: ${url.url}`);
+            console.log(`Likelihood of being a phishing site: ${percent}%`);
+            console.log("----");
+        });
+    } catch (e) {
+        console.log(`failed to inference ONNX model: ${e}.`);
+    }
+};
+main();
+// Expected output:
+// URL: https://www.rga.com/about/workplace
+// Likelihood of being a phishing site: 0.89%
+// ----
+```

model.onnx CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2da39da2b64ae2f867756f434a057eb162a75e806a5ec9d99a5c70fc592429ad
-size 22119084

 version https://git-lfs.github.com/spec/v1
+oid sha256:d4685fcb655e211a8e5c1acfeea93377b4e7005b6d5e8670e727d75b3a08b3d1
+size 22232006

model.pkl CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:37d17f4d2391fa9d6acdeab1dd0c9668a269ca2ed2823f4fb478e5b262c49afa
-size 45840213

 version https://git-lfs.github.com/spec/v1
+oid sha256:57bd4b9a3920643dabf348f853ca7710803ac2654b3f91577cc2a38f581d6908
+size 46071707