Jakh0103 commited on
Commit
5558ce6
·
verified ·
1 Parent(s): aef8df3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - cis-lmu/glotlid-corpus
4
+ pipeline_tag: text-classification
5
+ ---
6
+
7
+ ## Description
8
+ A Language Identification model that supports more than 2000 languages (three-letter ISO codes with script). For the list of all supported languages please refer to [labels.json](https://huggingface.co/Jakh0103/lid/blob/main/labels.json).
9
+
10
+ Repository: [GitHub](https://github.com/epfl-nlp/language-identification)
11
+
12
+ ## Usage
13
+
14
+ **Download the model**
15
+ ```
16
+ from huggingface_hub import snapshot_download
17
+
18
+ snapshot_download(repo_id="Jakh0103/lid", local_dir="checkpoint")
19
+ ```
20
+
21
+ **Use the model**
22
+ ```
23
+ from model import LID
24
+ model = LID.from_pretrained(dir='checkpoint')
25
+
26
+ # print the supported labels
27
+ print(model.get_labels())
28
+ ## ['aai_Latn', 'aak_Latn', 'aau_Latn', 'aaz_Latn', 'aba_Latn', ...]
29
+
30
+ # prediction
31
+ model.predict("The cat climbed onto the roof to enjoy the warm sunlight peacefully!")
32
+ # (['eng_Latn'], [0.970989465713501])
33
+
34
+ model.predict("The cat climbed onto the roof to enjoy the warm sunlight peacefully!", k=3)
35
+ ## (['eng_Latn', 'sco_Latn', 'jam_Latn'], [0.970989465713501, 0.006496887654066086, 0.00487488554790616])
36
+ ```