knowledgator
/

gliner-x-small-v0.5

Token Classification

information extraction

entity recognition

Model card Files Files and versions

alexandrlukashov commited on Jun 13

Commit

a5bc351

·

verified ·

1 Parent(s): b63dafb

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -38,7 +38,7 @@ tags:
 **GLiNER** is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoders (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.
-The initial GLiNER models were trained mainly on English data. Available multilingual models relied on existing multilingual NER datasets, but we prepared synthetical dataset [knowledgator/gliner-multilingual-synthetic](https://huggingface.co/datasets/knowledgator/gliner-multilingual-synthetic) using Qwen model to annotate [Fineweb-2](https://huggingface.co/datasets/HuggingFaceFW/fineweb-2) multilingual dataset. To enable broader language coverage, we replaced the `DeBERTa` backbone used in monolingual GLiNER models with `MT5` encoders, improving performance and adaptability across diverse languages.
 Key Advantages Over Previous GLiNER Models:
 * Enhanced performance and generalization capabilities

 **GLiNER** is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoders (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.
+The initial GLiNER models were trained mainly on English data. Available multilingual models relied on existing multilingual NER datasets, but we prepared synthetical dataset [knowledgator/gliner-multilingual-synthetic](https://huggingface.co/datasets/knowledgator/gliner-multilingual-synthetic) using LLM to annotate [Fineweb-2](https://huggingface.co/datasets/HuggingFaceFW/fineweb-2) multilingual dataset. To enable broader language coverage, we replaced the `DeBERTa` backbone used in monolingual GLiNER models with `MT5` encoders, improving performance and adaptability across diverse languages.
 Key Advantages Over Previous GLiNER Models:
 * Enhanced performance and generalization capabilities