Token Classification
GLiNER
PyTorch
NER
GLiNER
information extraction
encoder
entity recognition
alexandrlukashov commited on
Commit
a5bc351
·
verified ·
1 Parent(s): b63dafb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -38,7 +38,7 @@ tags:
38
 
39
  **GLiNER** is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoders (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.
40
 
41
- The initial GLiNER models were trained mainly on English data. Available multilingual models relied on existing multilingual NER datasets, but we prepared synthetical dataset [knowledgator/gliner-multilingual-synthetic](https://huggingface.co/datasets/knowledgator/gliner-multilingual-synthetic) using Qwen model to annotate [Fineweb-2](https://huggingface.co/datasets/HuggingFaceFW/fineweb-2) multilingual dataset. To enable broader language coverage, we replaced the `DeBERTa` backbone used in monolingual GLiNER models with `MT5` encoders, improving performance and adaptability across diverse languages.
42
 
43
  Key Advantages Over Previous GLiNER Models:
44
  * Enhanced performance and generalization capabilities
 
38
 
39
  **GLiNER** is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoders (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.
40
 
41
+ The initial GLiNER models were trained mainly on English data. Available multilingual models relied on existing multilingual NER datasets, but we prepared synthetical dataset [knowledgator/gliner-multilingual-synthetic](https://huggingface.co/datasets/knowledgator/gliner-multilingual-synthetic) using LLM to annotate [Fineweb-2](https://huggingface.co/datasets/HuggingFaceFW/fineweb-2) multilingual dataset. To enable broader language coverage, we replaced the `DeBERTa` backbone used in monolingual GLiNER models with `MT5` encoders, improving performance and adaptability across diverse languages.
42
 
43
  Key Advantages Over Previous GLiNER Models:
44
  * Enhanced performance and generalization capabilities