fjmgAI's picture
Update README.md
56dfa09 verified
metadata
language:
  - en
  - es
license: apache-2.0
tags:
  - sentence-transformers
  - cross-encoder
  - generated_from_trainer
  - dataset_size:578402
  - loss:BinaryCrossEntropyLoss
base_model: EuroBERT/EuroBERT-210m
pipeline_tag: text-ranking
library_name: sentence-transformers
metrics:
  - map
  - mrr@10
  - ndcg@10
model-index:
  - name: EuroBERT-210m trained on GooAQ
    results:
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: gooaq dev
          type: gooaq-dev
        metrics:
          - type: map
            value: 0.7097
            name: Map
          - type: mrr@10
            value: 0.7089
            name: Mrr@10
          - type: ndcg@10
            value: 0.7579
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoMSMARCO R100
          type: NanoMSMARCO_R100
        metrics:
          - type: map
            value: 0.463
            name: Map
          - type: mrr@10
            value: 0.4452
            name: Mrr@10
          - type: ndcg@10
            value: 0.5106
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoNFCorpus R100
          type: NanoNFCorpus_R100
        metrics:
          - type: map
            value: 0.3363
            name: Map
          - type: mrr@10
            value: 0.5204
            name: Mrr@10
          - type: ndcg@10
            value: 0.3632
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoNQ R100
          type: NanoNQ_R100
        metrics:
          - type: map
            value: 0.4738
            name: Map
          - type: mrr@10
            value: 0.4783
            name: Mrr@10
          - type: ndcg@10
            value: 0.5182
            name: Ndcg@10
      - task:
          type: cross-encoder-nano-beir
          name: Cross Encoder Nano BEIR
        dataset:
          name: NanoBEIR R100 mean
          type: NanoBEIR_R100_mean
        metrics:
          - type: map
            value: 0.4244
            name: Map
          - type: mrr@10
            value: 0.4813
            name: Mrr@10
          - type: ndcg@10
            value: 0.464
            name: Ndcg@10
datasets:
  - sentence-transformers/gooaq

Fine-Tuned Model

fjmgAI/rerank1-210M-EuroBERT

Base Model

EuroBERT/EuroBERT-210m

Fine-Tuning Method

This is a Cross Encoder model finetuned from EuroBERT/EuroBERT-210m using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Dataset

sentence-transformers/gooaq

Description

This dataset is a collection of question-answer pairs, collected from Google.

Fine-Tuning Details

  • The model was trained using 578,402 training samples from sentence-transformer.

Cross Encoder Reranking

Metric Value
map 0.7097 (+0.1786)
mrr@10 0.7089 (+0.1850)
ndcg@10 0.7579 (+0.1667)

Cross Encoder Reranking

  • Datasets: NanoMSMARCO_R100, NanoNFCorpus_R100 and NanoNQ_R100
  • Evaluated with CrossEncoderRerankingEvaluator with these parameters:
    {
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric NanoMSMARCO_R100 NanoNFCorpus_R100 NanoNQ_R100
map 0.4630 (-0.0266) 0.3363 (+0.0753) 0.4738 (+0.0542)
mrr@10 0.4452 (-0.0323) 0.5204 (+0.0206) 0.4783 (+0.0516)
ndcg@10 0.5106 (-0.0298) 0.3632 (+0.0381) 0.5182 (+0.0176)

Cross Encoder Nano BEIR

  • Dataset: NanoBEIR_R100_mean
  • Evaluated with CrossEncoderNanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "msmarco",
            "nfcorpus",
            "nq"
        ],
        "rerank_k": 100,
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric Value
map 0.4244 (+0.0343)
mrr@10 0.4813 (+0.0133)
ndcg@10 0.4640 (+0.0086)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("fjmgAI/rerank1-210M-EuroBERT", trust_remote_code=True)
# Get scores for pairs of texts
pairs = [
    ['what are the risks with taking statins?', "['Muscle pain and damage. One of the most common complaints of people taking statins is muscle pain. ... ', 'Liver damage. Occasionally, statin use could cause an increase in the level of enzymes that signal liver inflammation. ... ', 'Increased blood sugar or type 2 diabetes. ... ', 'Neurological side effects.']"],
    ['what are the risks with taking statins?', 'Doctors discovered that statins can help lower blood pressure, as well as lower cholesterol. Statins are often prescribed to people with high cholesterol. Too much cholesterol in your blood increases your risk of heart attacks and strokes.'],
    ['what are the risks with taking statins?', 'Lipitor and Crestor are both effective statins that lower levels of “bad” cholesterol and increase levels of “good” cholesterol. While Crestor is the more potent statin, both medications are effective and have slightly different side effects and drug interactions.'],
    ['what are the risks with taking statins?', "About simvastatin Simvastatin belongs to a group of medicines called statins. It's used to lower cholesterol if you've been diagnosed with high blood cholesterol. It's also taken to prevent heart disease, including heart attacks and strokes."],
    ['what are the risks with taking statins?', 'Zetia works to lower cholesterol in a new way different from the statins: it inhibits the absorption of cholesterol in the small intestine, whereas the statins work by blocking cholesterol production in the liver.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'what are the risks with taking statins?',
    [
        "['Muscle pain and damage. One of the most common complaints of people taking statins is muscle pain. ... ', 'Liver damage. Occasionally, statin use could cause an increase in the level of enzymes that signal liver inflammation. ... ', 'Increased blood sugar or type 2 diabetes. ... ', 'Neurological side effects.']",
        'Doctors discovered that statins can help lower blood pressure, as well as lower cholesterol. Statins are often prescribed to people with high cholesterol. Too much cholesterol in your blood increases your risk of heart attacks and strokes.',
        'Lipitor and Crestor are both effective statins that lower levels of “bad” cholesterol and increase levels of “good” cholesterol. While Crestor is the more potent statin, both medications are effective and have slightly different side effects and drug interactions.',
        "About simvastatin Simvastatin belongs to a group of medicines called statins. It's used to lower cholesterol if you've been diagnosed with high blood cholesterol. It's also taken to prevent heart disease, including heart attacks and strokes.",
        'Zetia works to lower cholesterol in a new way different from the statins: it inhibits the absorption of cholesterol in the small intestine, whereas the statins work by blocking cholesterol production in the liver.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 4.0.2
  • Transformers: 4.51.2
  • PyTorch: 2.6.0+cu126
  • Accelerate: 1.6.0
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Purpose

This tuned reranker model is optimized for Spanish and English applications, prioritizing accurate reordering of results by leveraging semantic similarity through refined embedding comparisons, ideal for enhancing question-answering and document retrieval tasks.

  • Developed by: fjmgAI
  • License: apache-2.0