Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Model Description
|
2 |
|
3 |
The XLM-RoBERTa model was proposed in [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data. This model is [XLM-RoBERTa-large](https://huggingface.co/xlm-roberta-large) fine-tuned with the [conll2003](https://huggingface.co/datasets/conll2003) dataset in English.
|
@@ -14,9 +111,9 @@ Hong Kong Legal Information Institute [HKILL](https://www.hklii.hk/eng/) is a fr
|
|
14 |
|
15 |
The model is a pretrained-finetuned language model. The model can be used for document classification, Named Entity Recognition (NER), especially on legal domain.
|
16 |
```python
|
17 |
-
>>> from transformers import pipeline
|
18 |
-
>>> tokenizer = AutoTokenizer.from_pretrained("xlm-
|
19 |
-
>>> model = AutoModelForTokenClassification.from_pretrained("xlm-
|
20 |
>>> classifier = pipeline("ner", model=model, tokenizer=tokenizer)
|
21 |
>>> classifier("Alya told Jasmine that Andrew could pay with cash..")
|
22 |
```
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- multilingual
|
4 |
+
- af
|
5 |
+
- am
|
6 |
+
- ar
|
7 |
+
- as
|
8 |
+
- az
|
9 |
+
- be
|
10 |
+
- bg
|
11 |
+
- bn
|
12 |
+
- br
|
13 |
+
- bs
|
14 |
+
- ca
|
15 |
+
- cs
|
16 |
+
- cy
|
17 |
+
- da
|
18 |
+
- de
|
19 |
+
- el
|
20 |
+
- en
|
21 |
+
- eo
|
22 |
+
- es
|
23 |
+
- et
|
24 |
+
- eu
|
25 |
+
- fa
|
26 |
+
- fi
|
27 |
+
- fr
|
28 |
+
- fy
|
29 |
+
- ga
|
30 |
+
- gd
|
31 |
+
- gl
|
32 |
+
- gu
|
33 |
+
- ha
|
34 |
+
- he
|
35 |
+
- hi
|
36 |
+
- hr
|
37 |
+
- hu
|
38 |
+
- hy
|
39 |
+
- id
|
40 |
+
- is
|
41 |
+
- it
|
42 |
+
- ja
|
43 |
+
- jv
|
44 |
+
- ka
|
45 |
+
- kk
|
46 |
+
- km
|
47 |
+
- kn
|
48 |
+
- ko
|
49 |
+
- ku
|
50 |
+
- ky
|
51 |
+
- la
|
52 |
+
- lo
|
53 |
+
- lt
|
54 |
+
- lv
|
55 |
+
- mg
|
56 |
+
- mk
|
57 |
+
- ml
|
58 |
+
- mn
|
59 |
+
- mr
|
60 |
+
- ms
|
61 |
+
- my
|
62 |
+
- ne
|
63 |
+
- nl
|
64 |
+
- no
|
65 |
+
- om
|
66 |
+
- or
|
67 |
+
- pa
|
68 |
+
- pl
|
69 |
+
- ps
|
70 |
+
- pt
|
71 |
+
- ro
|
72 |
+
- ru
|
73 |
+
- sa
|
74 |
+
- sd
|
75 |
+
- si
|
76 |
+
- sk
|
77 |
+
- sl
|
78 |
+
- so
|
79 |
+
- sq
|
80 |
+
- sr
|
81 |
+
- su
|
82 |
+
- sv
|
83 |
+
- sw
|
84 |
+
- ta
|
85 |
+
- te
|
86 |
+
- th
|
87 |
+
- tl
|
88 |
+
- tr
|
89 |
+
- ug
|
90 |
+
- uk
|
91 |
+
- ur
|
92 |
+
- uz
|
93 |
+
- vi
|
94 |
+
- xh
|
95 |
+
- yi
|
96 |
+
- zh
|
97 |
+
---
|
98 |
# Model Description
|
99 |
|
100 |
The XLM-RoBERTa model was proposed in [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data. This model is [XLM-RoBERTa-large](https://huggingface.co/xlm-roberta-large) fine-tuned with the [conll2003](https://huggingface.co/datasets/conll2003) dataset in English.
|
|
|
111 |
|
112 |
The model is a pretrained-finetuned language model. The model can be used for document classification, Named Entity Recognition (NER), especially on legal domain.
|
113 |
```python
|
114 |
+
>>> from transformers import pipeline,AutoTokenizer,AutoModelForTokenClassification
|
115 |
+
>>> tokenizer = AutoTokenizer.from_pretrained("hklegal-xlm-r-large-t")
|
116 |
+
>>> model = AutoModelForTokenClassification.from_pretrained("hklegal-xlm-r-large-t")
|
117 |
>>> classifier = pipeline("ner", model=model, tokenizer=tokenizer)
|
118 |
>>> classifier("Alya told Jasmine that Andrew could pay with cash..")
|
119 |
```
|