phjhk commited on
Commit
4e4df97
·
1 Parent(s): 8386232

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -3
README.md CHANGED
@@ -1,3 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Model Description
2
 
3
  The XLM-RoBERTa model was proposed in [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data. This model is [XLM-RoBERTa-large](https://huggingface.co/xlm-roberta-large) fine-tuned with the [conll2003](https://huggingface.co/datasets/conll2003) dataset in English.
@@ -14,9 +111,9 @@ Hong Kong Legal Information Institute [HKILL](https://www.hklii.hk/eng/) is a fr
14
 
15
  The model is a pretrained-finetuned language model. The model can be used for document classification, Named Entity Recognition (NER), especially on legal domain.
16
  ```python
17
- >>> from transformers import pipeline
18
- >>> tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large-finetuned-conll03-english")
19
- >>> model = AutoModelForTokenClassification.from_pretrained("xlm-roberta-large-finetuned-conll03-english")
20
  >>> classifier = pipeline("ner", model=model, tokenizer=tokenizer)
21
  >>> classifier("Alya told Jasmine that Andrew could pay with cash..")
22
  ```
 
1
+ ---
2
+ language:
3
+ - multilingual
4
+ - af
5
+ - am
6
+ - ar
7
+ - as
8
+ - az
9
+ - be
10
+ - bg
11
+ - bn
12
+ - br
13
+ - bs
14
+ - ca
15
+ - cs
16
+ - cy
17
+ - da
18
+ - de
19
+ - el
20
+ - en
21
+ - eo
22
+ - es
23
+ - et
24
+ - eu
25
+ - fa
26
+ - fi
27
+ - fr
28
+ - fy
29
+ - ga
30
+ - gd
31
+ - gl
32
+ - gu
33
+ - ha
34
+ - he
35
+ - hi
36
+ - hr
37
+ - hu
38
+ - hy
39
+ - id
40
+ - is
41
+ - it
42
+ - ja
43
+ - jv
44
+ - ka
45
+ - kk
46
+ - km
47
+ - kn
48
+ - ko
49
+ - ku
50
+ - ky
51
+ - la
52
+ - lo
53
+ - lt
54
+ - lv
55
+ - mg
56
+ - mk
57
+ - ml
58
+ - mn
59
+ - mr
60
+ - ms
61
+ - my
62
+ - ne
63
+ - nl
64
+ - no
65
+ - om
66
+ - or
67
+ - pa
68
+ - pl
69
+ - ps
70
+ - pt
71
+ - ro
72
+ - ru
73
+ - sa
74
+ - sd
75
+ - si
76
+ - sk
77
+ - sl
78
+ - so
79
+ - sq
80
+ - sr
81
+ - su
82
+ - sv
83
+ - sw
84
+ - ta
85
+ - te
86
+ - th
87
+ - tl
88
+ - tr
89
+ - ug
90
+ - uk
91
+ - ur
92
+ - uz
93
+ - vi
94
+ - xh
95
+ - yi
96
+ - zh
97
+ ---
98
  # Model Description
99
 
100
  The XLM-RoBERTa model was proposed in [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data. This model is [XLM-RoBERTa-large](https://huggingface.co/xlm-roberta-large) fine-tuned with the [conll2003](https://huggingface.co/datasets/conll2003) dataset in English.
 
111
 
112
  The model is a pretrained-finetuned language model. The model can be used for document classification, Named Entity Recognition (NER), especially on legal domain.
113
  ```python
114
+ >>> from transformers import pipeline,AutoTokenizer,AutoModelForTokenClassification
115
+ >>> tokenizer = AutoTokenizer.from_pretrained("hklegal-xlm-r-large-t")
116
+ >>> model = AutoModelForTokenClassification.from_pretrained("hklegal-xlm-r-large-t")
117
  >>> classifier = pipeline("ner", model=model, tokenizer=tokenizer)
118
  >>> classifier("Alya told Jasmine that Andrew could pay with cash..")
119
  ```