b1ade-embed is a small but efficient embedding model for RAG. In the legacy MTEB leaderboard ( - 2024) b1ade-embed was ranked #1 in the STS catagory and placed competitively for other important task categories such as ranking, retrieval and classification. The model was trained using a combination of:

  1. Model merging
    • bert-large-uncased
    • WhereIsAI/UAE-Large-V1
    • BAAI/bge-large-en-v1.5
    • mixedbread-ai/mxbai-embed-large-v1
    • avsolatorio/GIST-large-Embedding-v0)
  2. Knowledge distillation from larger models

To use this model:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("w601sxs/b1ade-embed")

b1ade-embed is part of a collection of small models for RAG. Stay tuned for more updates.

Use in research

Our embedding model "b1ade-embed" is a 335M parameter model that demonstrates strong performance across the board. Specifically, recent research used the model in clinical and labor market domains, relying on the #1 ranking of the model in Semantic Textual Similarity (STS) for models under 500M parameters on the MTEB leaderboard.

We've been working on b1ade-embed to optimize the balance between latency and performance. This balance is crucial in real-world applications, especially in verticalized domains, where rapid processing of vast amounts of data can significantly impact decision-making processes. While achieving high accuracy is important, the ability to deliver results quickly is equally vital. Larger embedding outputs also result in higher storage costs in vector indexes, so striking a balance in between task performance and latency is important.

The medRxiv paper, "A Scalable Framework for Benchmarking Embedding Models for Clinical Tasks," provides a comprehensive evaluation of embedding models in healthcare contexts. It tested 30 models across various clinical tasks (2.1M comparisons), including analysis of patient notes, synthetic EHRs, and MIMIC-IV ICU data, as well as biomedical tasks involving PubMed abstracts and research papers. The study highlights b1ade-embed's versatility across these domains:

"Other models exhibiting strong performance in both clinical and PubMed domains include 'b1ade-embed'." It also emphasizes the model's efficiency, noting that "Models like 'b1ade-embed' demonstrate high efficiency despite smaller size, making them ideal for tasks requiring rapid processing." The paper evaluated models on short tasks such as triage notes and chief complaints, where b1ade-embed achieved a high score of 27.4, competing closely with larger models.

In the labor market context, the CEUR-WS paper demonstrates b1ade-embed's effectiveness in taxonomy enrichment. The paper states, "We evaluated the robustness of our system against a closed-world evaluation constructed using ESCO's hierarchy, achieving a 81% Positive Predictive Value (PPV) when combining all three models." This high accuracy demonstrates b1ade-embed's capability to capture nuanced semantic relationships in labor market terminology. Of course, no model can be 👑. There is a need to carefully evaluate task performance vs latency for your specific embedding task - STS, retrieval, clustering etc.

Sources:

Cite

@misc{bigscience_workshop_2022,
    author       = { {Shreyas Subramanian} },
    title        = { {b1ade series of models} },
    year         = 2024,
    url          = { https://huggingface.co/w601sxs/b1ade-embed },
    publisher    = { Hugging Face }
}
Downloads last month
1,961
Safetensors
Model size
335M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for w601sxs/b1ade-embed

Finetuned
(26)
this model
Finetunes
1 model
Quantizations
1 model

Spaces using w601sxs/b1ade-embed 4

Collection including w601sxs/b1ade-embed

Evaluation results