File size: 1,539 Bytes
c8a5732
 
 
391ccdf
 
c8a5732
391ccdf
c8a5732
 
391ccdf
c8a5732
 
391ccdf
c8a5732
 
5d71300
c8a5732
5c632ba
 
 
 
 
 
c8a5732
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
391ccdf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
pipeline_tag: token-classification
tags:
- named-entity-recognition
- sequence-tagger-model
widget:
- text: Numele meu este Amadeus Wolfgang și locuiesc în Berlin
inference:
  parameters:
    aggregation_strategy: simple
    grouped_entities: true
language:
- ro
---

xlm-roberta model trained on [ronec](https://github.com/dumitrescustefan/ronec) dataset, performing 95 f1-Macro on test set.

| Test metric            | Results             |
|------------------------|--------------------------|
| test_f1_mac_ronec      | 0.9547659158706665       |
| test_loss_ronec        | 0.16371206939220428      |
| test_prec_mac_ronec    | 0.8663718700408936       |
| test_rec_mac_ronec     | 0.8695588111877441       |

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("EvanD/xlm-roberta-base-romanian-ner-ronec")
ner_model = AutoModelForTokenClassification.from_pretrained("EvanD/xlm-roberta-base-romanian-ner-ronec")

nlp = pipeline("ner", model=ner_model, tokenizer=tokenizer, aggregation_strategy="simple")
example = "Numele meu este Amadeus Wolfgang și locuiesc în Berlin"

ner_results = nlp(example)
print(ner_results)

# [
#     {
#         'entity_group': 'PER',
#         'score': 0.9966806,
#         'word': 'Amadeus Wolfgang',
#         'start': 16,
#         'end': 32
#     },
#     {'entity_group': 'GPE',
#      'score': 0.99694663,
#      'word': 'Berlin',
#      'start': 48,
#      'end': 54
#      }
# ]
```