Onyx Information Content Classification using SetFit with Base sentence-transformers/paraphrase-mpnet-base-v2

The model is for use by the Onyx Enterprise Search system to identify whether a short text segment contains information that could be useful by itself to answer a RAG-type question.

It is based on the SetFit approach, using sentence-transformers/paraphrase-mpnet-base-v2 as the Sentence Transformer embedding model. A trained LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

About Onyx

Model Details

Core Model Description

SetFit Resources

Uses

Use for Inference

The model is for use by the Onyx Enterprise Search system.

To test it locally, first install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("onyx-dot-app/information-content-model")
# Run inference
preds = model("Paris is in France")
or:
pred_probability = model.predict_proba("Paris is in France")

Framework Versions

  • Python: 3.11.10
  • SetFit: 1.1.1
  • Sentence Transformers: 3.4.1
  • Transformers: 4.49.0
  • PyTorch: 2.6.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX (SetFit Approach)

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
102
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for onyx-dot-app/information-content-model

Finetuned
(282)
this model