alvarobartt HF staff commited on
Commit
56da2c8
1 Parent(s): b4e79bd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +112 -10
README.md CHANGED
@@ -1,4 +1,6 @@
1
  ---
 
 
2
  library_name: span-marker
3
  tags:
4
  - span-marker
@@ -10,31 +12,77 @@ metrics:
10
  - precision
11
  - recall
12
  - f1
13
- widget: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  pipeline_tag: token-classification
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
- # SpanMarker
18
 
19
- This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that can be used for Named Entity Recognition.
20
 
21
  ## Model Details
22
 
23
  ### Model Description
24
 
25
  - **Model Type:** SpanMarker
26
- <!-- - **Encoder:** [Unknown](https://huggingface.co/models/unknown) -->
27
  - **Maximum Sequence Length:** 256 tokens
28
  - **Maximum Entity Length:** 8 words
29
  <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
30
- <!-- - **Language:** Unknown -->
31
- <!-- - **License:** Unknown -->
32
 
33
  ### Model Sources
34
 
35
  - **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER)
36
  - **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf)
37
 
 
 
 
 
 
 
 
 
38
  ## Uses
39
 
40
  ### Direct Use for Inference
@@ -43,9 +91,9 @@ This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that ca
43
  from span_marker import SpanMarkerModel
44
 
45
  # Download from the 馃 Hub
46
- model = SpanMarkerModel.from_pretrained("span_marker_model_id")
47
  # Run inference
48
- entities = model.predict("Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic to Paris.")
49
  ```
50
 
51
  ### Downstream Use
@@ -57,7 +105,7 @@ You can finetune this model on your own dataset.
57
  from span_marker import SpanMarkerModel, Trainer
58
 
59
  # Download from the 馃 Hub
60
- model = SpanMarkerModel.from_pretrained("span_marker_model_id")
61
 
62
  # Specify a Dataset with "tokens" and "ner_tag" columns
63
  dataset = load_dataset("conll2003") # For example CoNLL2003
@@ -69,7 +117,7 @@ trainer = Trainer(
69
  eval_dataset=dataset["validation"],
70
  )
71
  trainer.train()
72
- trainer.save_model("span_marker_model_id-finetuned")
73
  ```
74
  </details>
75
 
@@ -93,6 +141,60 @@ trainer.save_model("span_marker_model_id-finetuned")
93
 
94
  ## Training Details
95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
  ### Framework Versions
97
 
98
  - Python: 3.10.12
 
1
  ---
2
+ language: es
3
+ license: cc-by-4.0
4
  library_name: span-marker
5
  tags:
6
  - span-marker
 
12
  - precision
13
  - recall
14
  - f1
15
+ widget:
16
+ - text: Por otro lado , el primer ministro portugu茅s , Antonio Guterres , presidente
17
+ de turno del Consejo Europeo , recibi贸 hoy al ministro del Interior de Colombia
18
+ , Hugo de la Calle , enviado especial del presidente de su pa铆s , Andr茅s Pastrana
19
+ .
20
+ - text: Los consejeros de la Presidencia , Gaspar Zarr铆as , de Justicia , Carmen Hermos铆n
21
+ , y de Asuntos Sociales , Isa铆as P茅rez Salda帽a , dar谩n comienzo ma帽ana a los turnos
22
+ de comparecencias de los miembros del Gobierno andaluz en el Parlamento auton贸mico
23
+ para informar de las l铆neas de actuaci贸n de sus departamentos .
24
+ - text: '( SV2147 ) PP : PROBLEMAS INTERNOS PSOE INTERFIEREN EN POLITICA DE LA JUNTA
25
+ C贸rdoba ( EFE ) .'
26
+ - text: Cuando vino a Soria , en febrero de 1998 , para sustituir al entonces destituido
27
+ Antonio G贸mez , estaba dirigiendo al Badajoz B en tercera divisi贸n y consigui贸
28
+ con el Numancia la permanencia en la 煤ltima jornada frente al H茅rcules .
29
+ - text: El ministro ecuatoriano de Defensa , Hugo Unda , asegur贸 hoy que las Fuerzas
30
+ Armadas respetar谩n la decisi贸n del Parlamento sobre la amnist铆a para los involucrados
31
+ en la asonada golpista del pasado 21 de enero , cuando fue derrocado el presidente
32
+ Jamil Mahuad .
33
  pipeline_tag: token-classification
34
+ base_model: xlm-roberta-large
35
+ model-index:
36
+ - name: SpanMarker with xlm-roberta-large on conll2002
37
+ results:
38
+ - task:
39
+ type: token-classification
40
+ name: Named Entity Recognition
41
+ dataset:
42
+ name: conll2002
43
+ type: unknown
44
+ split: eval
45
+ metrics:
46
+ - type: f1
47
+ value: 0.8911398300151355
48
+ name: F1
49
+ - type: precision
50
+ value: 0.8981459751232105
51
+ name: Precision
52
+ - type: recall
53
+ value: 0.8842421441774492
54
+ name: Recall
55
  ---
56
 
57
+ # SpanMarker with xlm-roberta-large on conll2002
58
 
59
+ This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that can be used for Named Entity Recognition. This SpanMarker model uses [xlm-roberta-large](https://huggingface.co/models/xlm-roberta-large) as the underlying encoder.
60
 
61
  ## Model Details
62
 
63
  ### Model Description
64
 
65
  - **Model Type:** SpanMarker
66
+ - **Encoder:** [xlm-roberta-large](https://huggingface.co/models/xlm-roberta-large)
67
  - **Maximum Sequence Length:** 256 tokens
68
  - **Maximum Entity Length:** 8 words
69
  <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
70
+ - **Language:** es
71
+ - **License:** cc-by-4.0
72
 
73
  ### Model Sources
74
 
75
  - **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER)
76
  - **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf)
77
 
78
+ ### Model Labels
79
+ | Label | Examples |
80
+ |:------|:------------------------------------------------------------------|
81
+ | LOC | "Melbourne", "Australia", "Victoria" |
82
+ | MISC | "CrimeNet", "Ciudad", "Ley" |
83
+ | ORG | "Commonwealth", "Tribunal Supremo", "EFE" |
84
+ | PER | "Abogado General del Estado", "Daryl Williams", "Abogado General" |
85
+
86
  ## Uses
87
 
88
  ### Direct Use for Inference
 
91
  from span_marker import SpanMarkerModel
92
 
93
  # Download from the 馃 Hub
94
+ model = SpanMarkerModel.from_pretrained("alvarobartt/span-marker-xlm-roberta-large-conll-2002-es")
95
  # Run inference
96
+ entities = model.predict("( SV2147 ) PP : PROBLEMAS INTERNOS PSOE INTERFIEREN EN POLITICA DE LA JUNTA C贸rdoba ( EFE ) .")
97
  ```
98
 
99
  ### Downstream Use
 
105
  from span_marker import SpanMarkerModel, Trainer
106
 
107
  # Download from the 馃 Hub
108
+ model = SpanMarkerModel.from_pretrained("alvarobartt/span-marker-xlm-roberta-large-conll-2002-es")
109
 
110
  # Specify a Dataset with "tokens" and "ner_tag" columns
111
  dataset = load_dataset("conll2003") # For example CoNLL2003
 
117
  eval_dataset=dataset["validation"],
118
  )
119
  trainer.train()
120
+ trainer.save_model("alvarobartt/span-marker-xlm-roberta-large-conll-2002-es-finetuned")
121
  ```
122
  </details>
123
 
 
141
 
142
  ## Training Details
143
 
144
+ ### Training Set Metrics
145
+ | Training set | Min | Median | Max |
146
+ |:----------------------|:----|:--------|:-----|
147
+ | Sentence length | 1 | 31.8052 | 1238 |
148
+ | Entities per sentence | 0 | 2.2586 | 160 |
149
+
150
+ ### Training Hyperparameters
151
+ - learning_rate: 1e-05
152
+ - train_batch_size: 16
153
+ - eval_batch_size: 8
154
+ - seed: 42
155
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
156
+ - lr_scheduler_type: linear
157
+ - lr_scheduler_warmup_ratio: 0.1
158
+ - num_epochs: 2
159
+
160
+ ### Training Results
161
+ | Epoch | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy |
162
+ |:------:|:----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:|
163
+ | 0.0587 | 50 | 0.4612 | 0.0280 | 0.0007 | 0.0014 | 0.8576 |
164
+ | 0.1174 | 100 | 0.0512 | 0.5 | 0.0002 | 0.0005 | 0.8609 |
165
+ | 0.1761 | 150 | 0.0254 | 0.7622 | 0.5494 | 0.6386 | 0.9278 |
166
+ | 0.2347 | 200 | 0.0177 | 0.7840 | 0.7135 | 0.7471 | 0.9483 |
167
+ | 0.2934 | 250 | 0.0153 | 0.8072 | 0.7944 | 0.8007 | 0.9662 |
168
+ | 0.3521 | 300 | 0.0175 | 0.8439 | 0.7544 | 0.7966 | 0.9611 |
169
+ | 0.4108 | 350 | 0.0103 | 0.8828 | 0.8108 | 0.8452 | 0.9687 |
170
+ | 0.4695 | 400 | 0.0105 | 0.8674 | 0.8433 | 0.8552 | 0.9724 |
171
+ | 0.5282 | 450 | 0.0098 | 0.8651 | 0.8477 | 0.8563 | 0.9745 |
172
+ | 0.5869 | 500 | 0.0092 | 0.8634 | 0.8306 | 0.8467 | 0.9736 |
173
+ | 0.6455 | 550 | 0.0106 | 0.8556 | 0.8581 | 0.8568 | 0.9758 |
174
+ | 0.7042 | 600 | 0.0096 | 0.8712 | 0.8521 | 0.8616 | 0.9733 |
175
+ | 0.7629 | 650 | 0.0090 | 0.8791 | 0.8420 | 0.8601 | 0.9740 |
176
+ | 0.8216 | 700 | 0.0082 | 0.8883 | 0.8799 | 0.8840 | 0.9769 |
177
+ | 0.8803 | 750 | 0.0081 | 0.8877 | 0.8604 | 0.8739 | 0.9763 |
178
+ | 0.9390 | 800 | 0.0087 | 0.8785 | 0.8738 | 0.8762 | 0.9763 |
179
+ | 0.9977 | 850 | 0.0084 | 0.8777 | 0.8653 | 0.8714 | 0.9767 |
180
+ | 1.0563 | 900 | 0.0081 | 0.8894 | 0.8713 | 0.8803 | 0.9767 |
181
+ | 1.1150 | 950 | 0.0078 | 0.8944 | 0.8708 | 0.8825 | 0.9768 |
182
+ | 1.1737 | 1000 | 0.0079 | 0.8973 | 0.8722 | 0.8846 | 0.9776 |
183
+ | 1.2324 | 1050 | 0.0080 | 0.8792 | 0.8780 | 0.8786 | 0.9783 |
184
+ | 1.2911 | 1100 | 0.0082 | 0.8821 | 0.8574 | 0.8696 | 0.9767 |
185
+ | 1.3498 | 1150 | 0.0075 | 0.8928 | 0.8697 | 0.8811 | 0.9774 |
186
+ | 1.4085 | 1200 | 0.0076 | 0.8919 | 0.8803 | 0.8860 | 0.9792 |
187
+ | 1.4671 | 1250 | 0.0078 | 0.8846 | 0.8695 | 0.8770 | 0.9781 |
188
+ | 1.5258 | 1300 | 0.0074 | 0.8944 | 0.8845 | 0.8894 | 0.9792 |
189
+ | 1.5845 | 1350 | 0.0076 | 0.8922 | 0.8856 | 0.8889 | 0.9796 |
190
+ | 1.6432 | 1400 | 0.0072 | 0.9004 | 0.8799 | 0.8900 | 0.9790 |
191
+ | 1.7019 | 1450 | 0.0076 | 0.8944 | 0.8889 | 0.8916 | 0.9800 |
192
+ | 1.7606 | 1500 | 0.0074 | 0.8962 | 0.8861 | 0.8911 | 0.9800 |
193
+ | 1.8192 | 1550 | 0.0072 | 0.8988 | 0.8886 | 0.8937 | 0.9809 |
194
+ | 1.8779 | 1600 | 0.0074 | 0.8962 | 0.8833 | 0.8897 | 0.9797 |
195
+ | 1.9366 | 1650 | 0.0071 | 0.8976 | 0.8849 | 0.8912 | 0.9799 |
196
+ | 1.9953 | 1700 | 0.0071 | 0.8981 | 0.8842 | 0.8911 | 0.9799 |
197
+
198
  ### Framework Versions
199
 
200
  - Python: 3.10.12