|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
library_name: gliner |
|
|
pipeline_tag: token-classification |
|
|
datasets: |
|
|
- gretelai/synthetic_pii_finance_multilingual |
|
|
base_model: |
|
|
- urchade/gliner_small-v2.1 |
|
|
--- |
|
|
|
|
|
# Gravitee GliNER PII Detection π |
|
|
GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios. |
|
|
The list of entities is provided in Evaluation results; however, due to the model's nature, it has the capability to identify other entity types as well. |
|
|
|
|
|
Evaluation results: |
|
|
|
|
|
| Entity | Precision | Recall | F1 Score | |
|
|
|---------------------------|-----------|-----------|-----------| |
|
|
| email | 0.937213 | 0.925870 | 0.931507 | |
|
|
| phone_number | 0.898515 | 0.876812 | 0.887531 | |
|
|
| name | 0.929052 | 0.824776 | 0.873814 | |
|
|
| date_of_birth | 0.813953 | 0.937500 | 0.871369 | |
|
|
| date | 0.888942 | 0.839801 | 0.863673 | |
|
|
| location | 0.881579 | 0.829833 | 0.854924 | |
|
|
| company | 0.821222 | 0.873162 | 0.846396 | |
|
|
| ipv4 | 0.791667 | 0.890625 | 0.838235 | |
|
|
| ssn | 0.897959 | 0.785714 | 0.838095 | |
|
|
| bank_routing_number | 0.898305 | 0.746479 | 0.815385 | |
|
|
| driver_license_number | 0.918367 | 0.725806 | 0.810811 | |
|
|
| passport_number | 0.918367 | 0.714286 | 0.803571 | |
|
|
| credit_card_security_code | 0.830986 | 0.756410 | 0.791946 | |
|
|
| time | 0.834297 | 0.674455 | 0.745909 | |
|
|
|
|
|
## Installation |
|
|
To use this model, you must install the GLiNER Python library: |
|
|
``` |
|
|
!pip install gliner |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
Once you've downloaded the GLiNER library, you can import the GLiNER class. You can then load this model using `GLiNER.from_pretrained` and predict entities with `predict_entities`. |
|
|
|
|
|
```python |
|
|
from gliner import GLiNER |
|
|
|
|
|
# if you want to use quant model put "model.quant.onnx" in onnx_model_file argument |
|
|
model = GLiNER.from_pretrained( |
|
|
"gravitee-io/gliner-pii-detection", load_onnx_model=True, |
|
|
load_tokenizer=True, onnx_model_file="model.onnx" |
|
|
) |
|
|
|
|
|
text = """ |
|
|
Hey, just a quick update β I talked to David yesterday. |
|
|
He sent over the files from his private email ([email protected]), and we should be careful with his SSN: 123-45-6789. |
|
|
He mentioned his new address is 123 Maple Street in New York. |
|
|
His PC adress is 192.168.1.100. |
|
|
""" |
|
|
|
|
|
labels = ["name", |
|
|
"email", |
|
|
"ssn", |
|
|
"street_address", |
|
|
"date", |
|
|
"ipv4"] |
|
|
|
|
|
entities = model.predict_entities(text, labels) |
|
|
|
|
|
for entity in entities: |
|
|
print(entity["text"], "=>", entity["label"], "=>", entity["score"]) |
|
|
``` |
|
|
|
|
|
``` |
|
|
David => name => 0.9066112041473389 |
|
|
yesterday => date => 0.9482080340385437 |
|
|
[email protected] => email => 0.9911587834358215 |
|
|
123-45-6789 => ssn => 0.8612598180770874 |
|
|
123 Maple Street in New York => street_address => 0.9869663715362549 |
|
|
192.168.1.100 => ipv4 => 0.9810121059417725 |
|
|
``` |