Snowflake's Arctic-embed-l-v2.0

---
base_model:
- Snowflake/snowflake-arctic-embed-l-v2.0
pipeline_tag: sentence-similarity
tags:
- xlm-roberta
- mteb
- arctic
- snowflake-arctic-embed
- text-embeddings-inference
library_name: sentence-transformers
language:
- af
- ar
- az
- be
- bg
- bn
- ca
- ceb
- cs
- cy
- da
- de
- el
- en
- es
- et
- eu
- fa
- fi
- fr
- gl
- gu
- he
- hi
- hr
- ht
- hu
- hy
- id
- is
- it
- ja
- jv
- ka
- kk
- km
- kn
- ko
- ky
- lo
- lt
- lv
- mk
- ml
- mn
- mr
- ms
- my
- ne
- nl
- pa
- pl
- pt
- qu
- ro
- ru
- si
- sk
- sl
- so
- sq
- sr
- sv
- sw
- ta
- te
- th
- tl
- tr
- uk
- ur
- vi
- yo
- zh
---

GGUF quants of [Snowflake/snowflake-arctic-embed-l-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) created using [llama.cpp](https://github.com/ggerganov/llama.cpp)

Original model card:
***

<h1 align="center">Snowflake's Arctic-embed-l-v2.0</h1>
<h4 align="center">
   <p>
       <a href=#news>News</a> |
       <a href=#models>Models</a> |
       <a href=#usage>Usage</a>  |
       <a href="#evaluation">Evaluation</a> |
       <a href="#contact">Contact</a> |
       <a href="#faq">FAQ</a>
       <a href="#license">License</a> |
       <a href="#acknowledgement">Acknowledgement</a>
   <p>
</h4>

<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=18f5b1a3-da66-4f25-92d3-21da829509c3" />

## News
- 12/11/2024: Release of [Technical Report](https://arxiv.org/abs/2412.04506)
- 12/04/2024: Release of [snowflake-arctic-embed-l-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) and [snowflake-arctic-embed-m-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0) our newest models with multilingual workloads in mind.

## Models
Snowflake arctic-embed-l-v2.0 is the newest addition to the suite of embedding models Snowflake has released optimizing for retrieval performance and inference efficiency. 
Arctic Embed 2.0 introduces a new standard for multilingual embedding models, combining high-quality multilingual text retrieval without sacrificing performance in English. 
Released under the permissive Apache 2.0 license, Arctic Embed 2.0 is ideal for applications that demand reliable, enterprise-grade multilingual search and retrieval at scale.

Key Features:

1. Multilingual without compromise: Excels in English and non-English retrieval, outperforming leading open-source and proprietary models on benchmarks like MTEB Retrieval, CLEF, and MIRACL.

2. Inference efficiency: Its 303m non-embedding parameters inference is fast and efficient for any scale. 

3. Compression-friendly: Achieves high-quality retrieval with embeddings as small as 128 bytes/vector using Matryoshka Representation Learning (MRL) and quantization-aware embedding training.

4. Drop-In Replacement: arctic-embed-l-v2.0 builds on [BAAI/bge-m3-retromae](https://huggingface.co/BAAI/bge-m3-retromae) which allows direct drop-in inference replacement with any form of new libraries, kernels, inference engines etc. 

5. Long Context Support: arctic-embed-l-v2.0 builds on [BAAI/bge-m3-retromae](https://huggingface.co/BAAI/bge-m3-retromae) which can support a context window of up to 8192 via the use of RoPE. 


### Quality Benchmarks
Unlike most other open-source models, Arctic-embed-l-v2.0 excels across English (via MTEB Retrieval) and multilingual (via MIRACL and CLEF). 
You no longer need to support models to empower high-quality English and multilingual retrieval. All numbers mentioned below are the average NDCG@10 across the dataset being discussed. 

| Model Name | # params | # non-emb params | # dimensions | BEIR (15) | MIRACL (4) | CLEF (Focused) | CLEF (Full) |
|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| **snowflake-arctic-l-v2.0** | 568M | 303M | 1024 | **55.6** | 55.8 | **52.9** | **54.3** |
| snowflake-arctic-m | 109M | 86M | 768 | 54.9 | 24.9 | 34.4 | 29.1 |
| snowflake-arctic-l | 335M | 303M | 1024 | 56.0 | 34.8 | 38.2 | 33.7 |
| me5 base | 560M | 303M | 1024 | 51.4 | 54.0 | 43.0 | 34.6 |
| bge-m3 (BAAI) | 568M | 303M | 1024 | 48.8 | **56.8** | 40.8 | 41.3 |
| gte (Alibaba) | 305M | 113M | 768 | 51.1 | 52.3 | 47.7 | 53.1 |

Aside from high-quality retrieval arctic delivers embeddings that are easily compressible. Leverage vector truncation via MRL to decrease vector size by 4x with less than 3% degredation in quality. 
Combine MRLed vectors with vector compression (Int4) to power retrieval in 128 bytes per doc. 

| Model |  | BEIR (15) | Relative Performance | MIRACL (4) | Relative Performance | CLEF (5) | Relative Performance | CLEF (Full) | Relative Performance |
|---|---|:---:|:---:|:---:|:---:|:---:|---|---|---|
| snowflake-arctic-l-v2.0 | 1024 | 55.6 | N/A | 55.8 | N/A | 52.9 | N/A | 54.3 | N/A |
| snowflake-arctic-l-v2.0 | 256 | 54.3 | -0.18% | 54.3 | -2.70% | 51.9 | -1.81% | 53.4 | -1.53% |

## Usage

### Using Sentence Transformers

```python
from sentence_transformers import SentenceTransformer

# Load the model
model_name = 'Snowflake/snowflake-arctic-embed-l-v2.0'
model = SentenceTransformer(model_name)

# Define the queries and documents
queries = ['what is snowflake?', 'Where can I get the best tacos?']
documents = ['The Data Cloud!', 'Mexico City of Course!']

# Compute embeddings: use `prompt_name="query"` to encode queries!
query_embeddings = model.encode(queries, prompt_name="query") 
document_embeddings = model.encode(documents)

# Compute cosine similarity scores
scores = model.similarity(query_embeddings, document_embeddings)

# Output the results
for query, query_scores in zip(queries, scores):
    doc_score_pairs = list(zip(documents, query_scores))
    doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
    print("Query:", query)
    for document, score in doc_score_pairs:
        print(score, document)

```


### Using Huggingface Transformers


You can use the transformers package to use Snowflake's arctic-embed model, as shown below. For optimal retrieval quality, use the CLS token to embed each text portion and use the query prefix below (just on the query).

```python
import torch
from transformers import AutoModel, AutoTokenizer

model_name = 'Snowflake/snowflake-arctic-embed-l-v2.0'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, add_pooling_layer=False)
model.eval()

query_prefix = 'query: '
queries  = ['what is snowflake?', 'Where can I get the best tacos?']
queries_with_prefix = ["{}{}".format(query_prefix, i) for i in queries]
query_tokens = tokenizer(queries_with_prefix, padding=True, truncation=True, return_tensors='pt', max_length=8192)

documents = ['The Data Cloud!', 'Mexico City of Course!']
document_tokens =  tokenizer(documents, padding=True, truncation=True, return_tensors='pt', max_length=8192)

# Compute token embeddings
with torch.no_grad():
    query_embeddings = model(**query_tokens)[0][:, 0]
    document_embeddings = model(**document_tokens)[0][:, 0]


# normalize embeddings
query_embeddings = torch.nn.functional.normalize(query_embeddings, p=2, dim=1)
document_embeddings = torch.nn.functional.normalize(document_embeddings, p=2, dim=1)

scores = torch.mm(query_embeddings, document_embeddings.transpose(0, 1))
for query, query_scores in zip(queries, scores):
    doc_score_pairs = list(zip(documents, query_scores))
    doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
    #Output passages & scores
    print("Query:", query)
    for document, score in doc_score_pairs:
        print(score, document)
```


This should produce the following scores

```
Query: what is snowflake?
tensor(0.2715) The Data Cloud!
tensor(0.0661) Mexico City of Course!
Query: Where can I get the best tacos?
tensor(0.2797) Mexico City of Course!
tensor(0.1250) The Data Cloud!
```

### Using Huggingface Transformers.js

If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
```bash
npm i @huggingface/transformers
```

You can then use the model for retrieval, as follows:

```js
import { pipeline, dot } from '@huggingface/transformers';

// Create feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'Snowflake/snowflake-arctic-embed-m-v2.0', {
    dtype: 'q8',
});

// Generate sentence embeddings
const sentences = [
    'query: what is snowflake?',
    'The Data Cloud!',
    'Mexico City of Course!',
]
const output = await extractor(sentences, { normalize: true, pooling: 'cls' });

// Compute similarity scores
const [source_embeddings, ...document_embeddings ] = output.tolist();
const similarities = document_embeddings.map(x => dot(source_embeddings, x));
console.log(similarities); // [0.24783534471401417, 0.05313122704326892]
```


## Contact


Feel free to open an issue or pull request if you have any questions or suggestions about this project.
You also can email Daniel Campos(daniel.campos@snowflake.com).


## License
Arctic is licensed under the [Apache-2](https://www.apache.org/licenses/LICENSE-2.0). The released models can be used for commercial purposes free of charge.