--- base_model: - Snowflake/snowflake-arctic-embed-l-v2.0 pipeline_tag: sentence-similarity tags: - xlm-roberta - mteb - arctic - snowflake-arctic-embed - text-embeddings-inference library_name: sentence-transformers language: - af - ar - az - be - bg - bn - ca - ceb - cs - cy - da - de - el - en - es - et - eu - fa - fi - fr - gl - gu - he - hi - hr - ht - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ky - lo - lt - lv - mk - ml - mn - mr - ms - my - ne - nl - pa - pl - pt - qu - ro - ru - si - sk - sl - so - sq - sr - sv - sw - ta - te - th - tl - tr - uk - ur - vi - yo - zh --- GGUF quants of [Snowflake/snowflake-arctic-embed-l-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) created using [llama.cpp](https://github.com/ggerganov/llama.cpp) Original model card: ***

Snowflake's Arctic-embed-l-v2.0

News | Models | Usage | Evaluation | Contact | FAQ License | Acknowledgement

## News - 12/11/2024: Release of [Technical Report](https://arxiv.org/abs/2412.04506) - 12/04/2024: Release of [snowflake-arctic-embed-l-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) and [snowflake-arctic-embed-m-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0) our newest models with multilingual workloads in mind. ## Models Snowflake arctic-embed-l-v2.0 is the newest addition to the suite of embedding models Snowflake has released optimizing for retrieval performance and inference efficiency. Arctic Embed 2.0 introduces a new standard for multilingual embedding models, combining high-quality multilingual text retrieval without sacrificing performance in English. Released under the permissive Apache 2.0 license, Arctic Embed 2.0 is ideal for applications that demand reliable, enterprise-grade multilingual search and retrieval at scale. Key Features: 1. Multilingual without compromise: Excels in English and non-English retrieval, outperforming leading open-source and proprietary models on benchmarks like MTEB Retrieval, CLEF, and MIRACL. 2. Inference efficiency: Its 303m non-embedding parameters inference is fast and efficient for any scale. 3. Compression-friendly: Achieves high-quality retrieval with embeddings as small as 128 bytes/vector using Matryoshka Representation Learning (MRL) and quantization-aware embedding training. 4. Drop-In Replacement: arctic-embed-l-v2.0 builds on [BAAI/bge-m3-retromae](https://huggingface.co/BAAI/bge-m3-retromae) which allows direct drop-in inference replacement with any form of new libraries, kernels, inference engines etc. 5. Long Context Support: arctic-embed-l-v2.0 builds on [BAAI/bge-m3-retromae](https://huggingface.co/BAAI/bge-m3-retromae) which can support a context window of up to 8192 via the use of RoPE. ### Quality Benchmarks Unlike most other open-source models, Arctic-embed-l-v2.0 excels across English (via MTEB Retrieval) and multilingual (via MIRACL and CLEF). You no longer need to support models to empower high-quality English and multilingual retrieval. All numbers mentioned below are the average NDCG@10 across the dataset being discussed. | Model Name | # params | # non-emb params | # dimensions | BEIR (15) | MIRACL (4) | CLEF (Focused) | CLEF (Full) | |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:| | **snowflake-arctic-l-v2.0** | 568M | 303M | 1024 | **55.6** | 55.8 | **52.9** | **54.3** | | snowflake-arctic-m | 109M | 86M | 768 | 54.9 | 24.9 | 34.4 | 29.1 | | snowflake-arctic-l | 335M | 303M | 1024 | 56.0 | 34.8 | 38.2 | 33.7 | | me5 base | 560M | 303M | 1024 | 51.4 | 54.0 | 43.0 | 34.6 | | bge-m3 (BAAI) | 568M | 303M | 1024 | 48.8 | **56.8** | 40.8 | 41.3 | | gte (Alibaba) | 305M | 113M | 768 | 51.1 | 52.3 | 47.7 | 53.1 | Aside from high-quality retrieval arctic delivers embeddings that are easily compressible. Leverage vector truncation via MRL to decrease vector size by 4x with less than 3% degredation in quality. Combine MRLed vectors with vector compression (Int4) to power retrieval in 128 bytes per doc. | Model | | BEIR (15) | Relative Performance | MIRACL (4) | Relative Performance | CLEF (5) | Relative Performance | CLEF (Full) | Relative Performance | |---|---|:---:|:---:|:---:|:---:|:---:|---|---|---| | snowflake-arctic-l-v2.0 | 1024 | 55.6 | N/A | 55.8 | N/A | 52.9 | N/A | 54.3 | N/A | | snowflake-arctic-l-v2.0 | 256 | 54.3 | -0.18% | 54.3 | -2.70% | 51.9 | -1.81% | 53.4 | -1.53% | ## Usage ### Using Sentence Transformers ```python from sentence_transformers import SentenceTransformer # Load the model model_name = 'Snowflake/snowflake-arctic-embed-l-v2.0' model = SentenceTransformer(model_name) # Define the queries and documents queries = ['what is snowflake?', 'Where can I get the best tacos?'] documents = ['The Data Cloud!', 'Mexico City of Course!'] # Compute embeddings: use `prompt_name="query"` to encode queries! query_embeddings = model.encode(queries, prompt_name="query") document_embeddings = model.encode(documents) # Compute cosine similarity scores scores = model.similarity(query_embeddings, document_embeddings) # Output the results for query, query_scores in zip(queries, scores): doc_score_pairs = list(zip(documents, query_scores)) doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True) print("Query:", query) for document, score in doc_score_pairs: print(score, document) ``` ### Using Huggingface Transformers You can use the transformers package to use Snowflake's arctic-embed model, as shown below. For optimal retrieval quality, use the CLS token to embed each text portion and use the query prefix below (just on the query). ```python import torch from transformers import AutoModel, AutoTokenizer model_name = 'Snowflake/snowflake-arctic-embed-l-v2.0' tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModel.from_pretrained(model_name, add_pooling_layer=False) model.eval() query_prefix = 'query: ' queries = ['what is snowflake?', 'Where can I get the best tacos?'] queries_with_prefix = ["{}{}".format(query_prefix, i) for i in queries] query_tokens = tokenizer(queries_with_prefix, padding=True, truncation=True, return_tensors='pt', max_length=8192) documents = ['The Data Cloud!', 'Mexico City of Course!'] document_tokens = tokenizer(documents, padding=True, truncation=True, return_tensors='pt', max_length=8192) # Compute token embeddings with torch.no_grad(): query_embeddings = model(**query_tokens)[0][:, 0] document_embeddings = model(**document_tokens)[0][:, 0] # normalize embeddings query_embeddings = torch.nn.functional.normalize(query_embeddings, p=2, dim=1) document_embeddings = torch.nn.functional.normalize(document_embeddings, p=2, dim=1) scores = torch.mm(query_embeddings, document_embeddings.transpose(0, 1)) for query, query_scores in zip(queries, scores): doc_score_pairs = list(zip(documents, query_scores)) doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True) #Output passages & scores print("Query:", query) for document, score in doc_score_pairs: print(score, document) ``` This should produce the following scores ``` Query: what is snowflake? tensor(0.2715) The Data Cloud! tensor(0.0661) Mexico City of Course! Query: Where can I get the best tacos? tensor(0.2797) Mexico City of Course! tensor(0.1250) The Data Cloud! ``` ### Using Huggingface Transformers.js If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using: ```bash npm i @huggingface/transformers ``` You can then use the model for retrieval, as follows: ```js import { pipeline, dot } from '@huggingface/transformers'; // Create feature extraction pipeline const extractor = await pipeline('feature-extraction', 'Snowflake/snowflake-arctic-embed-m-v2.0', { dtype: 'q8', }); // Generate sentence embeddings const sentences = [ 'query: what is snowflake?', 'The Data Cloud!', 'Mexico City of Course!', ] const output = await extractor(sentences, { normalize: true, pooling: 'cls' }); // Compute similarity scores const [source_embeddings, ...document_embeddings ] = output.tolist(); const similarities = document_embeddings.map(x => dot(source_embeddings, x)); console.log(similarities); // [0.24783534471401417, 0.05313122704326892] ``` ## Contact Feel free to open an issue or pull request if you have any questions or suggestions about this project. You also can email Daniel Campos(daniel.campos@snowflake.com). ## License Arctic is licensed under the [Apache-2](https://www.apache.org/licenses/LICENSE-2.0). The released models can be used for commercial purposes free of charge.