File size: 1,142 Bytes
9e6e707
d9b470b
9e6e707
8854442
9e6e707
d9b470b
8854442
d9b470b
220d239
0f20942
9e6e707
220d239
 
 
efc4868
220d239
 
efc4868
 
 
 
 
 
423ef16
efc4868
 
 
cd7f345
efc4868
cd7f345
efc4868
4991afa
 
 
cd7f345
4991afa
 
efc4868
 
220d239
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

---
language: 
  - en
license: "apache-2.0"
datasets:
- CoNLL-2003
metrics:
- F1

---

This is a T5 small model finetuned on CoNLL-2003 dataset for named entity recognition (NER).

Example Input and Output:
“Recognize all the named entities in this sequence (replace named entities with one of [PER], [ORG], [LOC], [MISC]): When Alice visited New York” → “When PER visited LOC LOC"

Evaluation Result:

% of match (for comparison with ExT5: https://arxiv.org/pdf/2111.10952.pdf): 

| Model| ExT5_{Base} | This Model | T5_NER_CONLL_OUTPUTLIST 
| :---: | :---: | :---: | :---: |
| % of Complete Match| 86.53 | 79.03 | TBA| 



There are some outputs (212/3453 or 6.14% that does not have the same length as the input)

F1 score on testing set of those with matching length :

| Model | This Model | T5_NER_CONLL_OUTPUTLIST | BERTbase 
| :---: | :---: | :---: | 
| F1| 0.8901 | 0.8691| 0.9240

*Caveat: The testing set of these aren't the same, due to matching length issue... 
T5_NER_CONLL_OUTPUTLIST only has 27/3453 missing length (only 0.78%); The BERT number is directly from their paper (https://arxiv.org/pdf/1810.04805.pdf)