Casual-Autopsy commited on
Commit
fd1d899
·
verified ·
1 Parent(s): e919dfc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +195 -1
README.md CHANGED
@@ -84,4 +84,198 @@ language:
84
  - vi
85
  - yo
86
  - zh
87
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  - vi
85
  - yo
86
  - zh
87
+ ---
88
+
89
+ GGUF Quants of [Snowflake/snowflake-arctic-embed-l-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) created using [llama.cpp](https://github.com/ggerganov/llama.cpp)
90
+
91
+ Original model card:
92
+ ***
93
+
94
+ <h1 align="center">Snowflake's Arctic-embed-l-v2.0</h1>
95
+ <h4 align="center">
96
+ <p>
97
+ <a href=#news>News</a> |
98
+ <a href=#models>Models</a> |
99
+ <a href=#usage>Usage</a> |
100
+ <a href="#evaluation">Evaluation</a> |
101
+ <a href="#contact">Contact</a> |
102
+ <a href="#faq">FAQ</a>
103
+ <a href="#license">License</a> |
104
+ <a href="#acknowledgement">Acknowledgement</a>
105
+ <p>
106
+ </h4>
107
+
108
+ <img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=18f5b1a3-da66-4f25-92d3-21da829509c3" />
109
+
110
+ ## News
111
+ - 12/11/2024: Release of [Technical Report](https://arxiv.org/abs/2412.04506)
112
+ - 12/04/2024: Release of [snowflake-arctic-embed-l-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0) and [snowflake-arctic-embed-m-v2.0](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0) our newest models with multilingual workloads in mind.
113
+
114
+ ## Models
115
+ Snowflake arctic-embed-l-v2.0 is the newest addition to the suite of embedding models Snowflake has released optimizing for retrieval performance and inference efficiency.
116
+ Arctic Embed 2.0 introduces a new standard for multilingual embedding models, combining high-quality multilingual text retrieval without sacrificing performance in English.
117
+ Released under the permissive Apache 2.0 license, Arctic Embed 2.0 is ideal for applications that demand reliable, enterprise-grade multilingual search and retrieval at scale.
118
+
119
+ Key Features:
120
+
121
+ 1. Multilingual without compromise: Excels in English and non-English retrieval, outperforming leading open-source and proprietary models on benchmarks like MTEB Retrieval, CLEF, and MIRACL.
122
+
123
+ 2. Inference efficiency: Its 303m non-embedding parameters inference is fast and efficient for any scale.
124
+
125
+ 3. Compression-friendly: Achieves high-quality retrieval with embeddings as small as 128 bytes/vector using Matryoshka Representation Learning (MRL) and quantization-aware embedding training.
126
+
127
+ 4. Drop-In Replacement: arctic-embed-l-v2.0 builds on BAAI/bge-m3-retromae](https://huggingface.co/BAAI/bge-m3-retromae) which allows direct drop-in inference replacement with any form of new libraries, kernels, inference engines etc.
128
+
129
+ 5. Long Context Support: arctic-embed-l-v2.0 builds on [BAAI/bge-m3-retromae](https://huggingface.co/BAAI/bge-m3-retromae) which can support a context window of up to 8192 via the use of RoPE.
130
+
131
+
132
+ ### Quality Benchmarks
133
+ Unlike most other open-source models, Arctic-embed-l-v2.0 excels across English (via MTEB Retrieval) and multilingual (via MIRACL and CLEF).
134
+ You no longer need to support models to empower high-quality English and multilingual retrieval. All numbers mentioned below are the average NDCG@10 across the dataset being discussed.
135
+
136
+ | Model Name | # params | # non-emb params | # dimensions | BEIR (15) | MIRACL (4) | CLEF (Focused) | CLEF (Full) |
137
+ |---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
138
+ | **snowflake-arctic-l-v2.0** | 568M | 303M | 1024 | **55.6** | 55.8 | **52.9** | **54.3** |
139
+ | snowflake-arctic-m | 109M | 86M | 768 | 54.9 | 24.9 | 34.4 | 29.1 |
140
+ | snowflake-arctic-l | 335M | 303M | 1024 | 56.0 | 34.8 | 38.2 | 33.7 |
141
+ | me5 base | 560M | 303M | 1024 | 51.4 | 54.0 | 43.0 | 34.6 |
142
+ | bge-m3 (BAAI) | 568M | 303M | 1024 | 48.8 | **56.8** | 40.8 | 41.3 |
143
+ | gte (Alibaba) | 305M | 113M | 768 | 51.1 | 52.3 | 47.7 | 53.1 |
144
+
145
+ Aside from high-quality retrieval arctic delivers embeddings that are easily compressible. Leverage vector truncation via MRL to decrease vector size by 4x with less than 3% degredation in quality.
146
+ Combine MRLed vectors with vector compression (Int4) to power retrieval in 128 bytes per doc.
147
+
148
+ | Model | | BEIR (15) | Relative Performance | MIRACL (4) | Relative Performance | CLEF (5) | Relative Performance | CLEF (Full) | Relative Performance |
149
+ |---|---|:---:|:---:|:---:|:---:|:---:|---|---|---|
150
+ | snowflake-arctic-l-v2.0 | 1024 | 55.6 | N/A | 55.8 | N/A | 52.9 | N/A | 54.3 | N/A |
151
+ | snowflake-arctic-l-v2.0 | 256 | 54.3 | -0.18% | 54.3 | -2.70% | 51.9 | -1.81% | 53.4 | -1.53% |
152
+
153
+ ## Usage
154
+
155
+ ### Using Sentence Transformers
156
+
157
+ ```python
158
+ from sentence_transformers import SentenceTransformer
159
+
160
+ # Load the model
161
+ model_name = 'Snowflake/snowflake-arctic-embed-l-v2.0'
162
+ model = SentenceTransformer(model_name)
163
+
164
+ # Define the queries and documents
165
+ queries = ['what is snowflake?', 'Where can I get the best tacos?']
166
+ documents = ['The Data Cloud!', 'Mexico City of Course!']
167
+
168
+ # Compute embeddings: use `prompt_name="query"` to encode queries!
169
+ query_embeddings = model.encode(queries, prompt_name="query")
170
+ document_embeddings = model.encode(documents)
171
+
172
+ # Compute cosine similarity scores
173
+ scores = model.similarity(query_embeddings, document_embeddings)
174
+
175
+ # Output the results
176
+ for query, query_scores in zip(queries, scores):
177
+ doc_score_pairs = list(zip(documents, query_scores))
178
+ doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
179
+ print("Query:", query)
180
+ for document, score in doc_score_pairs:
181
+ print(score, document)
182
+
183
+ ```
184
+
185
+
186
+
187
+ ### Using Huggingface Transformers
188
+
189
+
190
+ You can use the transformers package to use Snowflake's arctic-embed model, as shown below. For optimal retrieval quality, use the CLS token to embed each text portion and use the query prefix below (just on the query).
191
+
192
+ ```python
193
+ import torch
194
+ from transformers import AutoModel, AutoTokenizer
195
+
196
+ model_name = 'Snowflake/snowflake-arctic-embed-l-v2.0'
197
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
198
+ model = AutoModel.from_pretrained(model_name, add_pooling_layer=False)
199
+ model.eval()
200
+
201
+ query_prefix = 'query: '
202
+ queries = ['what is snowflake?', 'Where can I get the best tacos?']
203
+ queries_with_prefix = ["{}{}".format(query_prefix, i) for i in queries]
204
+ query_tokens = tokenizer(queries_with_prefix, padding=True, truncation=True, return_tensors='pt', max_length=8192)
205
+
206
+ documents = ['The Data Cloud!', 'Mexico City of Course!']
207
+ document_tokens = tokenizer(documents, padding=True, truncation=True, return_tensors='pt', max_length=8192)
208
+
209
+ # Compute token embeddings
210
+ with torch.no_grad():
211
+ query_embeddings = model(**query_tokens)[0][:, 0]
212
+ document_embeddings = model(**document_tokens)[0][:, 0]
213
+
214
+
215
+ # normalize embeddings
216
+ query_embeddings = torch.nn.functional.normalize(query_embeddings, p=2, dim=1)
217
+ document_embeddings = torch.nn.functional.normalize(document_embeddings, p=2, dim=1)
218
+
219
+ scores = torch.mm(query_embeddings, document_embeddings.transpose(0, 1))
220
+ for query, query_scores in zip(queries, scores):
221
+ doc_score_pairs = list(zip(documents, query_scores))
222
+ doc_score_pairs = sorted(doc_score_pairs, key=lambda x: x[1], reverse=True)
223
+ #Output passages & scores
224
+ print("Query:", query)
225
+ for document, score in doc_score_pairs:
226
+ print(score, document)
227
+ ```
228
+
229
+
230
+ This should produce the following scores
231
+
232
+ ```
233
+ Query: what is snowflake?
234
+ tensor(0.2715) The Data Cloud!
235
+ tensor(0.0661) Mexico City of Course!
236
+ Query: Where can I get the best tacos?
237
+ tensor(0.2797) Mexico City of Course!
238
+ tensor(0.1250) The Data Cloud!
239
+ ```
240
+
241
+ ### Using Huggingface Transformers.js
242
+
243
+ If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
244
+ ```bash
245
+ npm i @huggingface/transformers
246
+ ```
247
+
248
+ You can then use the model for retrieval, as follows:
249
+
250
+ ```js
251
+ import { pipeline, dot } from '@huggingface/transformers';
252
+
253
+ // Create feature extraction pipeline
254
+ const extractor = await pipeline('feature-extraction', 'Snowflake/snowflake-arctic-embed-m-v2.0', {
255
+ dtype: 'q8',
256
+ });
257
+
258
+ // Generate sentence embeddings
259
+ const sentences = [
260
+ 'query: what is snowflake?',
261
+ 'The Data Cloud!',
262
+ 'Mexico City of Course!',
263
+ ]
264
+ const output = await extractor(sentences, { normalize: true, pooling: 'cls' });
265
+
266
+ // Compute similarity scores
267
+ const [source_embeddings, ...document_embeddings ] = output.tolist();
268
+ const similarities = document_embeddings.map(x => dot(source_embeddings, x));
269
+ console.log(similarities); // [0.24783534471401417, 0.05313122704326892]
270
+ ```
271
+
272
+
273
+ ## Contact
274
+
275
+
276
+ Feel free to open an issue or pull request if you have any questions or suggestions about this project.
277
+ You also can email Daniel Campos([email protected]).
278
+
279
+
280
+ ## License
281
+ Arctic is licensed under the [Apache-2](https://www.apache.org/licenses/LICENSE-2.0). The released models can be used for commercial purposes free of charge.