alvarobartt
/

span-marker-xlm-roberta-large-conll-2002-es

@@ -1,4 +1,6 @@
 ---
 library_name: span-marker
 tags:
 - span-marker
@@ -10,31 +12,77 @@ metrics:
 - precision
 - recall
 - f1
-widget: []
 pipeline_tag: token-classification
 ---
-# SpanMarker
-This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that can be used for Named Entity Recognition.
 ## Model Details
 ### Model Description
 - **Model Type:** SpanMarker
-<!-- - **Encoder:** [Unknown](https://huggingface.co/models/unknown) -->
 - **Maximum Sequence Length:** 256 tokens
 - **Maximum Entity Length:** 8 words
 <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
-<!-- - **Language:** Unknown -->
-<!-- - **License:** Unknown -->
 ### Model Sources
 - **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER)
 - **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf)
 ## Uses
 ### Direct Use for Inference
@@ -43,9 +91,9 @@ This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that ca
 from span_marker import SpanMarkerModel
 # Download from the 🤗 Hub
-model = SpanMarkerModel.from_pretrained("span_marker_model_id")
 # Run inference
-entities = model.predict("Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic to Paris.")
 ```
 ### Downstream Use
@@ -57,7 +105,7 @@ You can finetune this model on your own dataset.
 from span_marker import SpanMarkerModel, Trainer
 # Download from the 🤗 Hub
-model = SpanMarkerModel.from_pretrained("span_marker_model_id")
 # Specify a Dataset with "tokens" and "ner_tag" columns
 dataset = load_dataset("conll2003") # For example CoNLL2003
@@ -69,7 +117,7 @@ trainer = Trainer(
     eval_dataset=dataset["validation"],
 )
 trainer.train()
-trainer.save_model("span_marker_model_id-finetuned")
 ```
 </details>
@@ -93,6 +141,60 @@ trainer.save_model("span_marker_model_id-finetuned")
 ## Training Details
 ### Framework Versions
 - Python: 3.10.12

 ---
+language: es
+license: cc-by-4.0
 library_name: span-marker
 tags:
 - span-marker
 - precision
 - recall
 - f1
+widget:
+- text: Por otro lado , el primer ministro portugués , Antonio Guterres , presidente
+    de turno del Consejo Europeo , recibió hoy al ministro del Interior de Colombia
+    , Hugo de la Calle , enviado especial del presidente de su país , Andrés Pastrana
+    .
+- text: Los consejeros de la Presidencia , Gaspar Zarrías , de Justicia , Carmen Hermosín
+    , y de Asuntos Sociales , Isaías Pérez Saldaña , darán comienzo mañana a los turnos
+    de comparecencias de los miembros del Gobierno andaluz en el Parlamento autonómico
+    para informar de las líneas de actuación de sus departamentos .
+- text: '( SV2147 ) PP : PROBLEMAS INTERNOS PSOE INTERFIEREN EN POLITICA DE LA JUNTA
+    Córdoba ( EFE ) .'
+- text: Cuando vino a Soria , en febrero de 1998 , para sustituir al entonces destituido
+    Antonio Gómez , estaba dirigiendo al Badajoz B en tercera división y consiguió
+    con el Numancia la permanencia en la última jornada frente al Hércules .
+- text: El ministro ecuatoriano de Defensa , Hugo Unda , aseguró hoy que las Fuerzas
+    Armadas respetarán la decisión del Parlamento sobre la amnistía para los involucrados
+    en la asonada golpista del pasado 21 de enero , cuando fue derrocado el presidente
+    Jamil Mahuad .
 pipeline_tag: token-classification
+base_model: xlm-roberta-large
+model-index:
+- name: SpanMarker with xlm-roberta-large on conll2002
+  results:
+  - task:
+      type: token-classification
+      name: Named Entity Recognition
+    dataset:
+      name: conll2002
+      type: unknown
+      split: eval
+    metrics:
+    - type: f1
+      value: 0.8911398300151355
+      name: F1
+    - type: precision
+      value: 0.8981459751232105
+      name: Precision
+    - type: recall
+      value: 0.8842421441774492
+      name: Recall
 ---
+# SpanMarker with xlm-roberta-large on conll2002
+This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that can be used for Named Entity Recognition. This SpanMarker model uses [xlm-roberta-large](https://huggingface.co/models/xlm-roberta-large) as the underlying encoder.
 ## Model Details
 ### Model Description
 - **Model Type:** SpanMarker
+- **Encoder:** [xlm-roberta-large](https://huggingface.co/models/xlm-roberta-large)
 - **Maximum Sequence Length:** 256 tokens
 - **Maximum Entity Length:** 8 words
 <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
+- **Language:** es
+- **License:** cc-by-4.0
 ### Model Sources
 - **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER)
 - **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf)
+### Model Labels
+| Label | Examples                                                          |
+|:------|:------------------------------------------------------------------|
+| LOC   | "Melbourne", "Australia", "Victoria"                              |
+| MISC  | "CrimeNet", "Ciudad", "Ley"                                       |
+| ORG   | "Commonwealth", "Tribunal Supremo", "EFE"                         |
+| PER   | "Abogado General del Estado", "Daryl Williams", "Abogado General" |
 ## Uses
 ### Direct Use for Inference
 from span_marker import SpanMarkerModel
 # Download from the 🤗 Hub
+model = SpanMarkerModel.from_pretrained("alvarobartt/span-marker-xlm-roberta-large-conll-2002-es")
 # Run inference
+entities = model.predict("( SV2147 ) PP : PROBLEMAS INTERNOS PSOE INTERFIEREN EN POLITICA DE LA JUNTA Córdoba ( EFE ) .")
 ```
 ### Downstream Use
 from span_marker import SpanMarkerModel, Trainer
 # Download from the 🤗 Hub
+model = SpanMarkerModel.from_pretrained("alvarobartt/span-marker-xlm-roberta-large-conll-2002-es")
 # Specify a Dataset with "tokens" and "ner_tag" columns
 dataset = load_dataset("conll2003") # For example CoNLL2003
     eval_dataset=dataset["validation"],
 )
 trainer.train()
+trainer.save_model("alvarobartt/span-marker-xlm-roberta-large-conll-2002-es-finetuned")
 ```
 </details>
 ## Training Details
+### Training Set Metrics
+| Training set          | Min | Median  | Max  |
+|:----------------------|:----|:--------|:-----|
+| Sentence length       | 1   | 31.8052 | 1238 |
+| Entities per sentence | 0   | 2.2586  | 160  |
+### Training Hyperparameters
+- learning_rate: 1e-05
+- train_batch_size: 16
+- eval_batch_size: 8
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 2
+### Training Results
+| Epoch  | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy |
+|:------:|:----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:|
+| 0.0587 | 50   | 0.4612          | 0.0280               | 0.0007            | 0.0014        | 0.8576              |
+| 0.1174 | 100  | 0.0512          | 0.5                  | 0.0002            | 0.0005        | 0.8609              |
+| 0.1761 | 150  | 0.0254          | 0.7622               | 0.5494            | 0.6386        | 0.9278              |
+| 0.2347 | 200  | 0.0177          | 0.7840               | 0.7135            | 0.7471        | 0.9483              |
+| 0.2934 | 250  | 0.0153          | 0.8072               | 0.7944            | 0.8007        | 0.9662              |
+| 0.3521 | 300  | 0.0175          | 0.8439               | 0.7544            | 0.7966        | 0.9611              |
+| 0.4108 | 350  | 0.0103          | 0.8828               | 0.8108            | 0.8452        | 0.9687              |
+| 0.4695 | 400  | 0.0105          | 0.8674               | 0.8433            | 0.8552        | 0.9724              |
+| 0.5282 | 450  | 0.0098          | 0.8651               | 0.8477            | 0.8563        | 0.9745              |
+| 0.5869 | 500  | 0.0092          | 0.8634               | 0.8306            | 0.8467        | 0.9736              |
+| 0.6455 | 550  | 0.0106          | 0.8556               | 0.8581            | 0.8568        | 0.9758              |
+| 0.7042 | 600  | 0.0096          | 0.8712               | 0.8521            | 0.8616        | 0.9733              |
+| 0.7629 | 650  | 0.0090          | 0.8791               | 0.8420            | 0.8601        | 0.9740              |
+| 0.8216 | 700  | 0.0082          | 0.8883               | 0.8799            | 0.8840        | 0.9769              |
+| 0.8803 | 750  | 0.0081          | 0.8877               | 0.8604            | 0.8739        | 0.9763              |
+| 0.9390 | 800  | 0.0087          | 0.8785               | 0.8738            | 0.8762        | 0.9763              |
+| 0.9977 | 850  | 0.0084          | 0.8777               | 0.8653            | 0.8714        | 0.9767              |
+| 1.0563 | 900  | 0.0081          | 0.8894               | 0.8713            | 0.8803        | 0.9767              |
+| 1.1150 | 950  | 0.0078          | 0.8944               | 0.8708            | 0.8825        | 0.9768              |
+| 1.1737 | 1000 | 0.0079          | 0.8973               | 0.8722            | 0.8846        | 0.9776              |
+| 1.2324 | 1050 | 0.0080          | 0.8792               | 0.8780            | 0.8786        | 0.9783              |
+| 1.2911 | 1100 | 0.0082          | 0.8821               | 0.8574            | 0.8696        | 0.9767              |
+| 1.3498 | 1150 | 0.0075          | 0.8928               | 0.8697            | 0.8811        | 0.9774              |
+| 1.4085 | 1200 | 0.0076          | 0.8919               | 0.8803            | 0.8860        | 0.9792              |
+| 1.4671 | 1250 | 0.0078          | 0.8846               | 0.8695            | 0.8770        | 0.9781              |
+| 1.5258 | 1300 | 0.0074          | 0.8944               | 0.8845            | 0.8894        | 0.9792              |
+| 1.5845 | 1350 | 0.0076          | 0.8922               | 0.8856            | 0.8889        | 0.9796              |
+| 1.6432 | 1400 | 0.0072          | 0.9004               | 0.8799            | 0.8900        | 0.9790              |
+| 1.7019 | 1450 | 0.0076          | 0.8944               | 0.8889            | 0.8916        | 0.9800              |
+| 1.7606 | 1500 | 0.0074          | 0.8962               | 0.8861            | 0.8911        | 0.9800              |
+| 1.8192 | 1550 | 0.0072          | 0.8988               | 0.8886            | 0.8937        | 0.9809              |
+| 1.8779 | 1600 | 0.0074          | 0.8962               | 0.8833            | 0.8897        | 0.9797              |
+| 1.9366 | 1650 | 0.0071          | 0.8976               | 0.8849            | 0.8912        | 0.9799              |
+| 1.9953 | 1700 | 0.0071          | 0.8981               | 0.8842            | 0.8911        | 0.9799              |
 ### Framework Versions
 - Python: 3.10.12