|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
library_name: gliner |
|
pipeline_tag: token-classification |
|
--- |
|
|
|
# GLiNER-Large (Reproduce) Model |
|
|
|
This model is a reproduce version of GLiNER-large, the training hyperparameters are different from the original model. |
|
|
|
# Hyperparameters |
|
|
|
The detail of training hyperparameters can see in `deberta.yaml`. |
|
|
|
Except for config in `deberta.yaml`, i manually set the `lr_scheduler_type` to `cosine_with_min_lr` and `lr_scheduler_kwargs` to `{"min_lr_rate": 0.01}` in `train.py`: |
|
|
|
``` |
|
training_args = TrainingArguments( |
|
... |
|
lr_scheduler_type="cosine_with_min_lr", |
|
lr_scheduler_kwargs={"min_lr_rate": 0.01}, |
|
... |
|
) |
|
``` |
|
|
|
NOTE: The result is not stable, i guess the random shuffle of the dataset is the reason. |
|
|
|
# Weights |
|
|
|
Here are two weights, one is the final model after 4k iterations, which has the best performance on the zero-shot evaluation, and the other is the model after full training. |
|
|
|
|
|
| Model | link | AI | literature | music | politics | science | movie | restaurant | Average | |
|
| :--------: | :-------------------------------------------------------------------: | :---: | :--------: | :---: | :------: | :-----: | :---: | :--------: | :-----: | |
|
| iter_4000 | [π€](https://huggingface.co/liuyanyi/gliner_large_reproduce_iter_4000) | 56.7 | 65.1 | 69.6 | 74.2 | 60.9 | 60.6 | 39.7 | 61.0 | |
|
| iter_10000 | [π€](https://huggingface.co/liuyanyi/gliner_large_reproduce) | 55.1 | 62.9 | 68.3 | 71.6 | 57.3 | 58.4 | 40.5 | 59.2 | |
|
| Paper | [π€](https://huggingface.co/urchade) | 57.2 | 64.4 | 69.6 | 72.6 | 62.6 | 57.2 | 42.9 | 60.9 | |
|
|
|
|
|
# Using repo |
|
See https://github.com/urchade/GLiNER |