carbonnnnn commited on
Commit
a6250b2
·
1 Parent(s): 9d81a58

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -0
README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - exbert
5
+ license: apache-2.0
6
+ datasets:
7
+ - bookcorpus
8
+ - wikipedia
9
+ ---
10
+
11
+ # Finetuned DistilBERT
12
+
13
+ This model is a distilled version of the [BERT base model](https://huggingface.co/bert-base-uncased). It was
14
+ introduced in [this paper](https://arxiv.org/abs/1910.01108). The code for the distillation process can be found
15
+ [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation). This model is uncased: it does
16
+ not make a difference between english and English.
17
+
18
+ ## Model description
19
+
20
+ DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a
21
+ self-supervised fashion, using the BERT base model as a teacher. This model is further finetuned on the DB_PEDIA Dataset which can be found
22
+ [here](https://huggingface.co/datasets/DeveloperOats/DBPedia_Classes). This dataset consists of 342,782 Wikipedia articles that have been cleaned and classified into hierarchical classes.
23
+ The classification system spans three levels, with 9 classes at the first level, 70 classes at the second level,
24
+ and 219 classes at the third level.
25
+
26
+ ## Intended uses & limitations
27
+ You can use the model to extract structured content and organizing it into taxonomic categories.
28
+
29
+
30
+ ### How to use
31
+
32
+ You can use this model directly with a pipeline:
33
+
34
+ from transformers import pipeline
35
+ import numpy as np
36
+
37
+ text = "This was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three."
38
+ classifier = pipeline("text-classification", model="carbonnnnn/T2L1DISTILBERT")
39
+
40
+ labeltxt = np.loadtxt("TASK2/label_vals/l1.txt", dtype="str")
41
+ labelint = ['LABEL_0', 'LABEL_1', 'LABEL_2', 'LABEL_3', 'LABEL_4', 'LABEL_5', 'LABEL_6', 'LABEL_7', 'LABEL_8']
42
+
43
+ output = classifier(text)[0]['label']
44
+ for i in range(len(labelint)):
45
+ if output == labelint[i]:
46
+ print("Output is : " + str(labeltxt[i]))
47
+
48
+ ### Limitations and bias
49
+
50
+ Even if the training data used for this model could be characterized as fairly neutral, this model can have biased
51
+ predictions. It also inherits some of
52
+ [the bias of its teacher model](https://huggingface.co/bert-base-uncased#limitations-and-bias).
53
+
54
+
55
+ ## Evaluation results
56
+
57
+
58
+
59
+ ### BibTeX entry and citation info
60
+
61
+
62
+ <a href="https://huggingface.co/exbert/?model=distilbert-base-uncased">
63
+ <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
64
+ </a>