We are pleased to announce the new line of universal token classification models 🔥

knowledgator/universal-token-classification-65a3a5d3f266d20b2e05c34d

It can perform various information extraction tasks by analysing input prompts and recognizing parts of texts that satisfy prompts. In comparison with the first version, the second one is more general and can be recognised as entities, whole sentences, and even paragraphs.

The model can be used for the following tasks:
* Named entity recognition (NER);
* Open information extraction;
* Question answering;
* Relation extraction;
* Coreference resolution;
* Text cleaning;
* Summarization;

How to use:

from utca.core import (
    AddData,
    RenameAttribute,
    Flush
)
from utca.implementation.predictors import (
    TokenSearcherPredictor, TokenSearcherPredictorConfig
)
from utca.implementation.tasks import (
    TokenSearcherNER,
    TokenSearcherNERPostprocessor,
)
predictor = TokenSearcherPredictor(
    TokenSearcherPredictorConfig(
        device="cuda:0",
        model="knowledgator/UTC-DeBERTa-base-v2"
    )
)
ner_task = TokenSearcherNER(
    predictor=predictor,
    postprocess=[TokenSearcherNERPostprocessor(
        threshold=0.5
    )]
)

ner_task = TokenSearcherNER()

pipeline = (        
    AddData({"labels": ["scientist", "university", "city"]})         
    | ner_task
    | Flush(keys=["labels"])
    | RenameAttribute("output", "entities")
)
res = pipeline.run({
    "text": """Dr. Paul Hammond, a renowned neurologist at Johns Hopkins University, has recently published a paper in the prestigious journal "Nature Neuroscience". """
})

updated 2 models 10 months ago

la-min/GENI_GPT_Health

Updated May 9, 2024

la-min/mGPT_Myanmar

Updated Apr 24, 2024