PyTorch
Bavarian
GLiNER
Bavarian
File size: 2,655 Bytes
23f34ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7c5320a
23f34ee
1e0cbf0
1ea3af4
 
23f34ee
1ea3af4
 
5250ecc
23f34ee
1e0cbf0
23f34ee
 
 
 
 
 
 
 
 
 
1e0cbf0
 
 
23f34ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
license: cc-by-sa-4.0
datasets:
- bavarian-nlp/gliner-bavarian-v0.1
language:
- bar
base_model:
- gerturax/gerturax-3
tags:
- GLiNER
- Bavarian
---

# Bavarian GLiNER Model (v0.1)

**GLiNER** is a Named Entity Recognition (NER) model that leverages bidirectional transformer encoders (similar to BERT) to detect any type of entity.
It offers a practical alternative to traditional NER models, which are restricted to predefined entity types, and to Large Language Models (LLMs), which—while flexible—are often too large and expensive for resource-limited environments.

The initial GLiNER models were trained mainly on English data. Thankfully, [GLiNER-X](https://huggingface.co/collections/knowledgator/gliner-x-684320a3f1220315c651d2f5) improved performance and adaptability across diverse languages using multilingual NER datasets.

However, GLiNER-X does not support Bavarian at the moment, so this repository hosts the first GLiNER model for Bavarian 🥨

The Bavarian GLiNER model has the strong performing [GERTuraX-3](https://huggingface.co/gerturax/gerturax-3) as backbone model and was trained on over 100,000 sentences from the [Gemini-powered Bavarian NER Dataset](https://huggingface.co/datasets/bavarian-nlp/gemini-bavarian-ner-v0.1).


# Installation & Usage

Just install the latest GLiNER package incl. the tokenizers dependency to get started:

```python
pip3 install gliner[tokenizers] -U
```

After that the Bavarian GLiNER is ready to use:

```python
from gliner import GLiNER

model = GLiNER.from_pretrained("bavarian-nlp/gliner-bavarian-v0.1")

text = """Oktobafestln woan friaha in Bayern koa Sejtnheit.
 Se hom dozua deand, as eihglogade Meaznbia voam Ofong vo da neien Brausaison
 afz'braucha. D'Wuazln van heiting Mingara Oktobafest gengan 200 Joar zrugg.
 Zan easchnt Moi hods om 17. Oktoba 1810 stottg'fundn.
 Om 12. Oktoba 1810 hod ba da Hozadfeia van Kronprinz Ludwig (spada Ludwig I.)
 und Prinzessin Therese af ana Wiesn voa dena Stodmauan vo Minga a groß's
 Pferdlrenna stottg'fundn."""

label_set = ["location", "organization", "person", "prince", "event", "date"]

entities = model.predict_entities(text, label_set, threshold=0.5)

for entity in entities:
    print(entity["text"], "=>", entity["label"])
```

outputs:

```text
Oktobafestln => event
Bayern => location
Mingara Oktobafest => event
17. Oktoba 1810 => date
12. Oktoba 1810 => date
Ludwig => prince
Ludwig I. => prince
Therese => prince
Minga => location
```

# Changelog

* 09.07.2025: Initial version of this repo. More details about evaluation and pretraining will follow!

# Licence

The Bavarian GLiNER models is licenced under CC-BY-SA-4.0.