All gliner models are suffering from a common issue, including this one.
Input:
from gliner import GLiNER
model = GLiNER.from_pretrained("knowledgator/gliner-decoder-large-v1.0")
t0 = "Amy Poehler is a great comedian just like Tina Fey, but Amy is from Minnesota unlike Tina."
labels = ["person","profession","location"]
print(model.predict_entities(t0, labels, threshold=0.3))
Output:
Fetching 9 files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9/9 [00:00<00:00, 71902.35it/s]
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
[{'start': 0, 'end': 11, 'text': 'Amy Poehler', 'label': 'person', 'score': 0.9780160784721375},
{'start': 23, 'end': 31, 'text': 'comedian', 'label': 'profession', 'score': 0.9096028804779053},
{'start': 42, 'end': 50, 'text': 'Tina Fey', 'label': 'person', 'score': 0.9740211367607117},
{'start': 68, 'end': 77, 'text': 'Minnesota', 'label': 'location', 'score': 0.9995237588882446}]
The issue: the instance of "Amy" and "Tina" is not recognized as person. This issue has been reported in gliner's github repo as well- https://github.com/urchade/GLiNER/issues/242
I have patched this in my repository - https://github.com/deepanwadhwa/zink
It's not ideal because it's not a fix in the model but it works pretty well.
Hopefully in the next model some examples can be included in the training dataset where there are multiple instances of the same entity - "John Doe" and "John" in the same text and they are both recognized as person.