|
--- |
|
language: |
|
- de |
|
tags: |
|
- flair |
|
- sequence-tagger-model |
|
- part-of-speech |
|
- tweets |
|
--- |
|
|
|
# Fine-grained POS Tagging of German Tweets |
|
|
|
This Flair model was trained on the German Tweets dataset that is presented in the |
|
[Fine-grained POS Tagging of German Tweets](https://pdfs.semanticscholar.org/82c9/90aa15e2e35de8294b4a721785da1ede20d0.pdf) |
|
paper from Ines Rehbein. |
|
|
|
It achieves an accuracy of 92.88% on the development set and an accuracy of **93.16%** on the final test dataset. |
|
|
|
## Training |
|
|
|
All training code is released in [this](https://github.com/stefan-it/flair-experiments/tree/master/pos-twitter-german) repository. |
|
|
|
The model architecture uses the training strategy as proposed in the original [Flair](https://aclanthology.org/C18-1139/) paper: |
|
German FastText embeddings and German Flair Embeddings are stacked and passed into a BiLSTM-CRF sequence labeler, achieving robost |
|
SOTA results on PoS Tagging of German Tweets. |
|
|
|
The full training log can be found [here](training.log). |
|
|
|
## Demo: How to use in Flair |
|
|
|
```python |
|
from flair.data import Sentence |
|
from flair.models import SequenceTagger |
|
|
|
model = SequenceTagger.load('flair/de-pos-fine-grained') |
|
sent = Sentence("@Sneeekas Ich nicht \o/", use_tokenizer=False) |
|
model.predict(sent) |
|
|
|
print(sent) |
|
``` |
|
|
|
This yields the following output: |
|
|
|
```text |
|
Sentence[4]: "@Sneeekas Ich nicht \o/" → ["@Sneeekas"/ADDRESS, "Ich"/PPER, "nicht"/PTKNEG, "\o/"/EMO] |
|
``` |