All gliner models are suffering from a common issue, including this one.

#4
by deepanwa - opened

Input:

from gliner import GLiNER

model = GLiNER.from_pretrained("knowledgator/gliner-decoder-large-v1.0")

t0 = "Amy Poehler is a great comedian just like Tina Fey, but Amy is from Minnesota unlike Tina."

labels = ["person","profession","location"]

print(model.predict_entities(t0, labels, threshold=0.3))

Output:
Fetching 9 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9/9 [00:00<00:00, 71902.35it/s]
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
[{'start': 0, 'end': 11, 'text': 'Amy Poehler', 'label': 'person', 'score': 0.9780160784721375},
{'start': 23, 'end': 31, 'text': 'comedian', 'label': 'profession', 'score': 0.9096028804779053},
{'start': 42, 'end': 50, 'text': 'Tina Fey', 'label': 'person', 'score': 0.9740211367607117},
{'start': 68, 'end': 77, 'text': 'Minnesota', 'label': 'location', 'score': 0.9995237588882446}]

The issue: the instance of "Amy" and "Tina" is not recognized as person. This issue has been reported in gliner's github repo as well- https://github.com/urchade/GLiNER/issues/242

I have patched this in my repository - https://github.com/deepanwadhwa/zink

It's not ideal because it's not a fix in the model but it works pretty well.

Hopefully in the next model some examples can be included in the training dataset where there are multiple instances of the same entity - "John Doe" and "John" in the same text and they are both recognized as person.

Knowledgator Engineering org

Hi @deepanwa , thank you for the feedback. This is probably more about training dataset artifacts, but can be also something deeper. We will explore it in detailes for sure.

Sign up or log in to comment