skythrone commited on
Commit
7b478b5
·
verified ·
1 Parent(s): 4825efd

update readme.md

Browse files
Files changed (1) hide show
  1. README.md +71 -3
README.md CHANGED
@@ -1,3 +1,71 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - privacy
5
+ - policy-analysis
6
+ - classification
7
+ - text-classification
8
+ - transformers
9
+ - distilbert
10
+ library_name: transformers
11
+ datasets:
12
+ - opp-115
13
+ model-index:
14
+ - name: Privacy Clause Classifier (DistilBERT - OPP-115)
15
+ results: []
16
+ ---
17
+
18
+ # Privacy Clause Classifier (DistilBERT - OPP-115)
19
+
20
+ This model is a fine-tuned DistilBERT model designed to classify **privacy policy clauses** into one of the predefined privacy practices based on the [OPP-115 dataset](https://privacy-hosting.isi.edu/data/OPP-115.pdf).
21
+
22
+ | ID | Category |
23
+ |----|---------------------------------|
24
+ | 0 | Data Retention |
25
+ | 1 | Data Security |
26
+ | 2 | Do Not Track |
27
+ | 3 | First Party Collection/Use |
28
+ | 4 | International and Specific Audiences |
29
+ | 5 | Other |
30
+ | 6 | Policy Change |
31
+ | 7 | Third Party Sharing/Collection |
32
+ | 8 | User Access, Edit and Deletion |
33
+ | 9 | User Choice/Control |
34
+
35
+ ---
36
+
37
+ ## Model Details
38
+
39
+ - **Architecture**: DistilBERT (pretrained)
40
+ - **Fine-tuning Dataset**: [OPP-115 Dataset](https://privacy-hosting.isi.edu/data/OPP-115.pdf)
41
+ - **Input Format**: Text snippets from privacy policies
42
+ - **Output Format**: Predicted class label with probabilities
43
+
44
+ ---
45
+
46
+ ## Intended Uses
47
+
48
+ - Automatic **privacy policy clause classification**
49
+ - **Regulatory technology (RegTech)** tools
50
+ - **Privacy policy summarization** and simplification
51
+ - **Risk analysis** for data sharing and collection practices
52
+
53
+ ---
54
+
55
+ ## How to Use
56
+
57
+ ```python
58
+ from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
59
+ import torch
60
+
61
+ # Load model
62
+ tokenizer = DistilBertTokenizerFast.from_pretrained("your-hf-username/your-model-name")
63
+ model = DistilBertForSequenceClassification.from_pretrained("your-hf-username/your-model-name")
64
+
65
+ # Predict
66
+ text = "We may collect your location data to provide customized services."
67
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
68
+ outputs = model(**inputs)
69
+ predicted_class = torch.argmax(outputs.logits, dim=-1).item()
70
+
71
+ print(f"Predicted Category: {predicted_class}")