metadata

base_model: nomic-ai/nomic-embed-text-v2-moe
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
license: apache-2.0
language:
  - en
  - es
  - fr
  - de
  - it
  - pt
  - pl
  - nl
  - tr
  - ja
  - vi
  - ru
  - id
  - ar
  - cs
  - ro
  - sv
  - el
  - uk
  - zh
  - hu
  - da
  - 'no'
  - hi
  - fi
  - bg
  - ko
  - sk
  - th
  - he
  - ca
  - lt
  - fa
  - ms
  - sl
  - lv
  - mr
  - bn
  - sq
  - cy
  - be
  - ml
  - kn
  - mk
  - ur
  - fy
  - te
  - eu
  - sw
  - so
  - sd
  - uz
  - co
  - hr
  - gu
  - ce
  - eo
  - jv
  - la
  - zu
  - mn
  - si
  - ga
  - ky
  - tg
  - my
  - km
  - mg
  - pa
  - sn
  - ha
  - ht
  - su
  - gd
  - ny
  - ps
  - ku
  - am
  - ig
  - lo
  - mi
  - nn
  - sm
  - yi
  - st
  - tl
  - xh
  - yo
  - af
  - ta
  - tn
  - ug
  - az
  - ba
  - bs
  - dv
  - et
  - gl
  - gn
  - gv
  - hy

nomic-embed-text-v2-moe: Multilingual Mixture of Experts Text Embeddings

Model Overview

nomic-embed-text-v2-moe is SoTA multilingual MoE text embedding model:

High Performance: SoTA Multilingual performance compared to ~300M parameter models, competitive with models 2x in size
Multilinguality: Supports 100+ languages and trained over 1.6B pairs
Flexible Embedding Dimension: Trained with Matryoshka Embeddings with 3x reductions in storage cost with minimal performance degredations
Fully-Open Source: Model weights, code, and training data (see code repo) released

Model	Params (M)	Emb Dim	BEIR	MIRACL	Pretrain Data	Finetune Data	Code
Nomic Embed v2	305	768	52.86	65.80	✅	✅	✅
mE5 Base	278	768	48.88	62.30	❌	❌	❌
mGTE Base	305	768	51.10	63.40	❌	❌	❌
Arctic Embed v2 Base	305	768	55.40	59.90	❌	❌	❌

BGE M3	568	1024	48.80	69.20	❌	✅	❌
Arctic Embed v2 Large	568	1024	55.65	66.00	❌	❌	❌
mE5 Large	560	1024	51.40	66.50	❌	❌	❌

Model Architecture

Total Parameters: 475M
Active Parameters During Inference: 305M
Architecture Type: Mixture of Experts (MoE)
MoE Configuration: 8 experts with top-2 routing
Embedding Dimensions: Supports flexible dimension from 768 to 256 through Matryoshka representation learning
Maximum Sequence Length: 512 tokens
Languages: Supports dozens of languages (see Performance section)

Usage Guide

Installation

The model can be used through SentenceTransformers and Transformers.

Important: the text prompt must include a task instruction prefix, instructing the model which task is being performed.

For queries/questions, please use search_query: and search_document: for the corresponding document

Transformers If using Transformers, make sure to prepend the task instruction prefix

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("nomic-ai/nomic-embed-text-v2-moe")
model = AutoModel.from_pretrained("nomic-ai/nomic-embed-text-v2-moe", trust_remote_code=True)

sentences = ['search_document: Hello!', 'search_document: ¡Hola!']

def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
model.eval()
with torch.no_grad():
    model_output = model(**encoded_input)
embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
embeddings = F.normalize(embeddings, p=2, dim=1)

SentenceTransformers With SentenceTransformers, you can specify the prompt_name (query or passage)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v2-moe", trust_remote_code=True)
sentences = ["Hello!", "¡Hola!"]
embeddings = model.encode(sentences, prompt_name="passage")

Performance

Best Practices

Add appropriate prefixes to your text:
- For queries: "search_query: "
- For documents: "search_document: "
Maximum input length is 512 tokens
For optimal efficiency, consider using the 256-dimension embeddings if storage/compute is a concern

Limitations

Performance may vary across different languages
Resource requirements may be higher than traditional dense models due to MoE architecture
Must have trust_remote_code=True when loading the model

Training Details

Trained on 1.6 billion high-quality pairs across multiple languages
Uses consistency filtering to ensure high-quality training data
Incorporates Matryoshka representation learning for dimension flexibility
Training includes both weakly-supervised contrastive pretraining and supervised finetuning

nomic-ai
/

nomic-embed-text-v2-moe