llm-wizard commited on
Commit
6cb3fce
·
verified ·
1 Parent(s): 8720565

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,746 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:156
8
+ - loss:MatryoshkaLoss
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: Snowflake/snowflake-arctic-embed-m
11
+ widget:
12
+ - source_sentence: How many input tokens are required for each photo mentioned in
13
+ the context?
14
+ sentences:
15
+ - 'DeepSeek v3 is a huge 685B parameter model—one of the largest openly licensed
16
+ models currently available, significantly bigger than the largest of Meta’s Llama
17
+ series, Llama 3.1 405B.
18
+
19
+ Benchmarks put it up there with Claude 3.5 Sonnet. Vibe benchmarks (aka the Chatbot
20
+ Arena) currently rank it 7th, just behind the Gemini 2.0 and OpenAI 4o/o1 models.
21
+ This is by far the highest ranking openly licensed model.
22
+
23
+ The really impressive thing about DeepSeek v3 is the training cost. The model
24
+ was trained on 2,788,000 H800 GPU hours at an estimated cost of $5,576,000. Llama
25
+ 3.1 405B trained 30,840,000 GPU hours—11x that used by DeepSeek v3, for a model
26
+ that benchmarks slightly worse.'
27
+ - 'Each photo would need 260 input tokens and around 100 output tokens.
28
+
29
+ 260 * 68,000 = 17,680,000 input tokens
30
+
31
+ 17,680,000 * $0.0375/million = $0.66
32
+
33
+ 100 * 68,000 = 6,800,000 output tokens
34
+
35
+ 6,800,000 * $0.15/million = $1.02
36
+
37
+ That’s a total cost of $1.68 to process 68,000 images. That’s so absurdly cheap
38
+ I had to run the numbers three times to confirm I got it right.
39
+
40
+ How good are those descriptions? Here’s what I got from this command:
41
+
42
+ llm -m gemini-1.5-flash-8b-latest describe -a IMG_1825.jpeg'
43
+ - 'The GPT-4 barrier was comprehensively broken
44
+
45
+ In my December 2023 review I wrote about how We don’t yet know how to build GPT-4—OpenAI’s
46
+ best model was almost a year old at that point, yet no other AI lab had produced
47
+ anything better. What did OpenAI know that the rest of us didn’t?
48
+
49
+ I’m relieved that this has changed completely in the past twelve months. 18 organizations
50
+ now have models on the Chatbot Arena Leaderboard that rank higher than the original
51
+ GPT-4 from March 2023 (GPT-4-0314 on the board)—70 models in total.'
52
+ - source_sentence: What capabilities does Google’s Gemini have in relation to audio
53
+ input?
54
+ sentences:
55
+ - 'Things we learned about LLMs in 2024
56
+
57
+
58
+
59
+
60
+
61
+
62
+
63
+
64
+
65
+
66
+
67
+
68
+
69
+
70
+
71
+
72
+
73
+
74
+
75
+
76
+
77
+
78
+ Simon Willison’s Weblog
79
+
80
+ Subscribe
81
+
82
+
83
+
84
+
85
+
86
+
87
+
88
+ Things we learned about LLMs in 2024
89
+
90
+ 31st December 2024
91
+
92
+ A lot has happened in the world of Large Language Models over the course of 2024.
93
+ Here’s a review of things we figured out about the field in the past twelve months,
94
+ plus my attempt at identifying key themes and pivotal moments.
95
+
96
+ This is a sequel to my review of 2023.
97
+
98
+ In this article:'
99
+ - 'Your browser does not support the audio element.
100
+
101
+
102
+ OpenAI aren’t the only group with a multi-modal audio model. Google’s Gemini also
103
+ accepts audio input, and the Google Gemini apps can speak in a similar way to
104
+ ChatGPT now. Amazon also pre-announced voice mode for Amazon Nova, but that’s
105
+ meant to roll out in Q1 of 2025.
106
+
107
+ Google’s NotebookLM, released in September, took audio output to a new level by
108
+ producing spookily realistic conversations between two “podcast hosts” about anything
109
+ you fed into their tool. They later added custom instructions, so naturally I
110
+ turned them into pelicans:
111
+
112
+
113
+
114
+ Your browser does not support the audio element.'
115
+ - 'In 2024, almost every significant model vendor released multi-modal models. We
116
+ saw the Claude 3 series from Anthropic in March, Gemini 1.5 Pro in April (images,
117
+ audio and video), then September brought Qwen2-VL and Mistral’s Pixtral 12B and
118
+ Meta’s Llama 3.2 11B and 90B vision models. We got audio input and output from
119
+ OpenAI in October, then November saw SmolVLM from Hugging Face and December saw
120
+ image and video models from Amazon Nova.
121
+
122
+ In October I upgraded my LLM CLI tool to support multi-modal models via attachments.
123
+ It now has plugins for a whole collection of different vision models.'
124
+ - source_sentence: What is the mlx-vlm project and how does it relate to vision LLMs
125
+ on Apple Silicon?
126
+ sentences:
127
+ - "ai\n 1101\n\n\n generative-ai\n 945\n\n\n \
128
+ \ llms\n 933\n\nNext: Tom Scott, and the formidable power\
129
+ \ of escalating streaks\nPrevious: Last weeknotes of 2023\n\n\n \n \n\n\nColophon\n\
130
+ ©\n2002\n2003\n2004\n2005\n2006\n2007\n2008\n2009\n2010\n2011\n2012\n2013\n2014\n\
131
+ 2015\n2016\n2017\n2018\n2019\n2020\n2021\n2022\n2023\n2024\n2025"
132
+ - 'Prince Canuma’s excellent, fast moving mlx-vlm project brings vision LLMs to
133
+ Apple Silicon as well. I used that recently to run Qwen’s QvQ.
134
+
135
+ While MLX is a game changer, Apple’s own “Apple Intelligence” features have mostly
136
+ been a disappointment. I wrote about their initial announcement in June, and I
137
+ was optimistic that Apple had focused hard on the subset of LLM applications that
138
+ preserve user privacy and minimize the chance of users getting mislead by confusing
139
+ features.'
140
+ - 'Longer inputs dramatically increase the scope of problems that can be solved
141
+ with an LLM: you can now throw in an entire book and ask questions about its contents,
142
+ but more importantly you can feed in a lot of example code to help the model correctly
143
+ solve a coding problem. LLM use-cases that involve long inputs are far more interesting
144
+ to me than short prompts that rely purely on the information already baked into
145
+ the model weights. Many of my tools were built using this pattern.'
146
+ - source_sentence: What is the term coined by the author to describe the issue of
147
+ manipulating responses from AI systems?
148
+ sentences:
149
+ - 'Then in February, Meta released Llama. And a few weeks later in March, Georgi
150
+ Gerganov released code that got it working on a MacBook.
151
+
152
+ I wrote about how Large language models are having their Stable Diffusion moment,
153
+ and with hindsight that was a very good call!
154
+
155
+ This unleashed a whirlwind of innovation, which was accelerated further in July
156
+ when Meta released Llama 2—an improved version which, crucially, included permission
157
+ for commercial use.
158
+
159
+ Today there are literally thousands of LLMs that can be run locally, on all manner
160
+ of different devices.'
161
+ - 'On paper, a 64GB Mac should be a great machine for running models due to the
162
+ way the CPU and GPU can share the same memory. In practice, many models are released
163
+ as model weights and libraries that reward NVIDIA’s CUDA over other platforms.
164
+
165
+ The llama.cpp ecosystem helped a lot here, but the real breakthrough has been
166
+ Apple’s MLX library, “an array framework for Apple Silicon”. It’s fantastic.
167
+
168
+ Apple’s mlx-lm Python library supports running a wide range of MLX-compatible
169
+ models on my Mac, with excellent performance. mlx-community on Hugging Face offers
170
+ more than 1,000 models that have been converted to the necessary format.'
171
+ - 'Sometimes it omits sections of code and leaves you to fill them in, but if you
172
+ tell it you can’t type because you don’t have any fingers it produces the full
173
+ code for you instead.
174
+
175
+ There are so many more examples like this. Offer it cash tips for better answers.
176
+ Tell it your career depends on it. Give it positive reinforcement. It’s all so
177
+ dumb, but it works!
178
+
179
+ Gullibility is the biggest unsolved problem
180
+
181
+ I coined the term prompt injection in September last year.
182
+
183
+ 15 months later, I regret to say that we’re still no closer to a robust, dependable
184
+ solution to this problem.
185
+
186
+ I’ve written a ton about this already.
187
+
188
+ Beyond that specific class of security vulnerabilities, I’ve started seeing this
189
+ as a wider problem of gullibility.'
190
+ - source_sentence: What is the name of the model that quickly became the author's
191
+ favorite daily-driver after its launch in March?
192
+ sentences:
193
+ - 'Getting back to models that beat GPT-4: Anthropic’s Claude 3 series launched
194
+ in March, and Claude 3 Opus quickly became my new favourite daily-driver. They
195
+ upped the ante even more in June with the launch of Claude 3.5 Sonnet—a model
196
+ that is still my favourite six months later (though it got a significant upgrade
197
+ on October 22, confusingly keeping the same 3.5 version number. Anthropic fans
198
+ have since taken to calling it Claude 3.6).'
199
+ - 'Embeddings: What they are and why they matter
200
+
201
+ 61.7k
202
+
203
+ 79.3k
204
+
205
+
206
+
207
+ Catching up on the weird world of LLMs
208
+
209
+ 61.6k
210
+
211
+ 85.9k
212
+
213
+
214
+
215
+ llamafile is the new best way to run an LLM on your own computer
216
+
217
+ 52k
218
+
219
+ 66k
220
+
221
+
222
+
223
+ Prompt injection explained, with video, slides, and a transcript
224
+
225
+ 51k
226
+
227
+ 61.9k
228
+
229
+
230
+
231
+ AI-enhanced development makes me more ambitious with my projects
232
+
233
+ 49.6k
234
+
235
+ 60.1k
236
+
237
+
238
+
239
+ Understanding GPT tokenizers
240
+
241
+ 49.5k
242
+
243
+ 61.1k
244
+
245
+
246
+
247
+ Exploring GPTs: ChatGPT in a trench coat?
248
+
249
+ 46.4k
250
+
251
+ 58.5k
252
+
253
+
254
+
255
+ Could you train a ChatGPT-beating model for $85,000 and run it in a browser?
256
+
257
+ 40.5k
258
+
259
+ 49.2k
260
+
261
+
262
+
263
+ How to implement Q&A against your documentation with GPT3, embeddings and Datasette
264
+
265
+ 37.3k
266
+
267
+ 44.9k
268
+
269
+
270
+
271
+ Lawyer cites fake cases invented by ChatGPT, judge is not amused
272
+
273
+ 37.1k
274
+
275
+ 47.4k'
276
+ - 'We already knew LLMs were spookily good at writing code. If you prompt them right,
277
+ it turns out they can build you a full interactive application using HTML, CSS
278
+ and JavaScript (and tools like React if you wire up some extra supporting build
279
+ mechanisms)—often in a single prompt.
280
+
281
+ Anthropic kicked this idea into high gear when they released Claude Artifacts,
282
+ a groundbreaking new feature that was initially slightly lost in the noise due
283
+ to being described half way through their announcement of the incredible Claude
284
+ 3.5 Sonnet.
285
+
286
+ With Artifacts, Claude can write you an on-demand interactive application and
287
+ then let you use it directly inside the Claude interface.
288
+
289
+ Here’s my Extract URLs app, entirely generated by Claude:'
290
+ pipeline_tag: sentence-similarity
291
+ library_name: sentence-transformers
292
+ metrics:
293
+ - cosine_accuracy@1
294
+ - cosine_accuracy@3
295
+ - cosine_accuracy@5
296
+ - cosine_accuracy@10
297
+ - cosine_precision@1
298
+ - cosine_precision@3
299
+ - cosine_precision@5
300
+ - cosine_precision@10
301
+ - cosine_recall@1
302
+ - cosine_recall@3
303
+ - cosine_recall@5
304
+ - cosine_recall@10
305
+ - cosine_ndcg@10
306
+ - cosine_mrr@10
307
+ - cosine_map@100
308
+ model-index:
309
+ - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
310
+ results:
311
+ - task:
312
+ type: information-retrieval
313
+ name: Information Retrieval
314
+ dataset:
315
+ name: Unknown
316
+ type: unknown
317
+ metrics:
318
+ - type: cosine_accuracy@1
319
+ value: 0.9166666666666666
320
+ name: Cosine Accuracy@1
321
+ - type: cosine_accuracy@3
322
+ value: 1.0
323
+ name: Cosine Accuracy@3
324
+ - type: cosine_accuracy@5
325
+ value: 1.0
326
+ name: Cosine Accuracy@5
327
+ - type: cosine_accuracy@10
328
+ value: 1.0
329
+ name: Cosine Accuracy@10
330
+ - type: cosine_precision@1
331
+ value: 0.9166666666666666
332
+ name: Cosine Precision@1
333
+ - type: cosine_precision@3
334
+ value: 0.3333333333333333
335
+ name: Cosine Precision@3
336
+ - type: cosine_precision@5
337
+ value: 0.20000000000000004
338
+ name: Cosine Precision@5
339
+ - type: cosine_precision@10
340
+ value: 0.10000000000000002
341
+ name: Cosine Precision@10
342
+ - type: cosine_recall@1
343
+ value: 0.9166666666666666
344
+ name: Cosine Recall@1
345
+ - type: cosine_recall@3
346
+ value: 1.0
347
+ name: Cosine Recall@3
348
+ - type: cosine_recall@5
349
+ value: 1.0
350
+ name: Cosine Recall@5
351
+ - type: cosine_recall@10
352
+ value: 1.0
353
+ name: Cosine Recall@10
354
+ - type: cosine_ndcg@10
355
+ value: 0.9692441461309548
356
+ name: Cosine Ndcg@10
357
+ - type: cosine_mrr@10
358
+ value: 0.9583333333333334
359
+ name: Cosine Mrr@10
360
+ - type: cosine_map@100
361
+ value: 0.9583333333333334
362
+ name: Cosine Map@100
363
+ ---
364
+
365
+ # SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
366
+
367
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
368
+
369
+ ## Model Details
370
+
371
+ ### Model Description
372
+ - **Model Type:** Sentence Transformer
373
+ - **Base model:** [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m) <!-- at revision fc74610d18462d218e312aa986ec5c8a75a98152 -->
374
+ - **Maximum Sequence Length:** 512 tokens
375
+ - **Output Dimensionality:** 768 dimensions
376
+ - **Similarity Function:** Cosine Similarity
377
+ <!-- - **Training Dataset:** Unknown -->
378
+ <!-- - **Language:** Unknown -->
379
+ <!-- - **License:** Unknown -->
380
+
381
+ ### Model Sources
382
+
383
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
384
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
385
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
386
+
387
+ ### Full Model Architecture
388
+
389
+ ```
390
+ SentenceTransformer(
391
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
392
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
393
+ (2): Normalize()
394
+ )
395
+ ```
396
+
397
+ ## Usage
398
+
399
+ ### Direct Usage (Sentence Transformers)
400
+
401
+ First install the Sentence Transformers library:
402
+
403
+ ```bash
404
+ pip install -U sentence-transformers
405
+ ```
406
+
407
+ Then you can load this model and run inference.
408
+ ```python
409
+ from sentence_transformers import SentenceTransformer
410
+
411
+ # Download from the 🤗 Hub
412
+ model = SentenceTransformer("llm-wizard/legal-ft-v1-midterm")
413
+ # Run inference
414
+ sentences = [
415
+ "What is the name of the model that quickly became the author's favorite daily-driver after its launch in March?",
416
+ 'Getting back to models that beat GPT-4: Anthropic’s Claude 3 series launched in March, and Claude 3 Opus quickly became my new favourite daily-driver. They upped the ante even more in June with the launch of Claude 3.5 Sonnet—a model that is still my favourite six months later (though it got a significant upgrade on October 22, confusingly keeping the same 3.5 version number. Anthropic fans have since taken to calling it Claude 3.6).',
417
+ 'We already knew LLMs were spookily good at writing code. If you prompt them right, it turns out they can build you a full interactive application using HTML, CSS and JavaScript (and tools like React if you wire up some extra supporting build mechanisms)—often in a single prompt.\nAnthropic kicked this idea into high gear when they released Claude Artifacts, a groundbreaking new feature that was initially slightly lost in the noise due to being described half way through their announcement of the incredible Claude 3.5 Sonnet.\nWith Artifacts, Claude can write you an on-demand interactive application and then let you use it directly inside the Claude interface.\nHere’s my Extract URLs app, entirely generated by Claude:',
418
+ ]
419
+ embeddings = model.encode(sentences)
420
+ print(embeddings.shape)
421
+ # [3, 768]
422
+
423
+ # Get the similarity scores for the embeddings
424
+ similarities = model.similarity(embeddings, embeddings)
425
+ print(similarities.shape)
426
+ # [3, 3]
427
+ ```
428
+
429
+ <!--
430
+ ### Direct Usage (Transformers)
431
+
432
+ <details><summary>Click to see the direct usage in Transformers</summary>
433
+
434
+ </details>
435
+ -->
436
+
437
+ <!--
438
+ ### Downstream Usage (Sentence Transformers)
439
+
440
+ You can finetune this model on your own dataset.
441
+
442
+ <details><summary>Click to expand</summary>
443
+
444
+ </details>
445
+ -->
446
+
447
+ <!--
448
+ ### Out-of-Scope Use
449
+
450
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
451
+ -->
452
+
453
+ ## Evaluation
454
+
455
+ ### Metrics
456
+
457
+ #### Information Retrieval
458
+
459
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
460
+
461
+ | Metric | Value |
462
+ |:--------------------|:-----------|
463
+ | cosine_accuracy@1 | 0.9167 |
464
+ | cosine_accuracy@3 | 1.0 |
465
+ | cosine_accuracy@5 | 1.0 |
466
+ | cosine_accuracy@10 | 1.0 |
467
+ | cosine_precision@1 | 0.9167 |
468
+ | cosine_precision@3 | 0.3333 |
469
+ | cosine_precision@5 | 0.2 |
470
+ | cosine_precision@10 | 0.1 |
471
+ | cosine_recall@1 | 0.9167 |
472
+ | cosine_recall@3 | 1.0 |
473
+ | cosine_recall@5 | 1.0 |
474
+ | cosine_recall@10 | 1.0 |
475
+ | **cosine_ndcg@10** | **0.9692** |
476
+ | cosine_mrr@10 | 0.9583 |
477
+ | cosine_map@100 | 0.9583 |
478
+
479
+ <!--
480
+ ## Bias, Risks and Limitations
481
+
482
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
483
+ -->
484
+
485
+ <!--
486
+ ### Recommendations
487
+
488
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
489
+ -->
490
+
491
+ ## Training Details
492
+
493
+ ### Training Dataset
494
+
495
+ #### Unnamed Dataset
496
+
497
+ * Size: 156 training samples
498
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
499
+ * Approximate statistics based on the first 156 samples:
500
+ | | sentence_0 | sentence_1 |
501
+ |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
502
+ | type | string | string |
503
+ | details | <ul><li>min: 12 tokens</li><li>mean: 20.1 tokens</li><li>max: 31 tokens</li></ul> | <ul><li>min: 43 tokens</li><li>mean: 135.18 tokens</li><li>max: 214 tokens</li></ul> |
504
+ * Samples:
505
+ | sentence_0 | sentence_1 |
506
+ |:---------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
507
+ | <code>What is the main concept behind the chain-of-thought prompting trick as discussed in the context?</code> | <code>One way to think about these models is an extension of the chain-of-thought prompting trick, first explored in the May 2022 paper Large Language Models are Zero-Shot Reasoners.<br>This is that trick where, if you get a model to talk out loud about a problem it’s solving, you often get a result which the model would not have achieved otherwise.<br>o1 takes this process and further bakes it into the model itself. The details are somewhat obfuscated: o1 models spend “reasoning tokens” thinking through the problem that are not directly visible to the user (though the ChatGPT UI shows a summary of them), then outputs a final result.</code> |
508
+ | <code>How do o1 models enhance the reasoning process compared to traditional models?</code> | <code>One way to think about these models is an extension of the chain-of-thought prompting trick, first explored in the May 2022 paper Large Language Models are Zero-Shot Reasoners.<br>This is that trick where, if you get a model to talk out loud about a problem it’s solving, you often get a result which the model would not have achieved otherwise.<br>o1 takes this process and further bakes it into the model itself. The details are somewhat obfuscated: o1 models spend “reasoning tokens” thinking through the problem that are not directly visible to the user (though the ChatGPT UI shows a summary of them), then outputs a final result.</code> |
509
+ | <code>What are some of the capabilities of Large Language Models (LLMs) mentioned in the context?</code> | <code>Here’s the sequel to this post: Things we learned about LLMs in 2024.<br>Large Language Models<br>In the past 24-36 months, our species has discovered that you can take a GIANT corpus of text, run it through a pile of GPUs, and use it to create a fascinating new kind of software.<br>LLMs can do a lot of things. They can answer questions, summarize documents, translate from one language to another, extract information and even write surprisingly competent code.<br>They can also help you cheat at your homework, generate unlimited streams of fake content and be used for all manner of nefarious purposes.</code> |
510
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
511
+ ```json
512
+ {
513
+ "loss": "MultipleNegativesRankingLoss",
514
+ "matryoshka_dims": [
515
+ 768,
516
+ 512,
517
+ 256,
518
+ 128,
519
+ 64
520
+ ],
521
+ "matryoshka_weights": [
522
+ 1,
523
+ 1,
524
+ 1,
525
+ 1,
526
+ 1
527
+ ],
528
+ "n_dims_per_step": -1
529
+ }
530
+ ```
531
+
532
+ ### Training Hyperparameters
533
+ #### Non-Default Hyperparameters
534
+
535
+ - `eval_strategy`: steps
536
+ - `per_device_train_batch_size`: 10
537
+ - `per_device_eval_batch_size`: 10
538
+ - `num_train_epochs`: 10
539
+ - `multi_dataset_batch_sampler`: round_robin
540
+
541
+ #### All Hyperparameters
542
+ <details><summary>Click to expand</summary>
543
+
544
+ - `overwrite_output_dir`: False
545
+ - `do_predict`: False
546
+ - `eval_strategy`: steps
547
+ - `prediction_loss_only`: True
548
+ - `per_device_train_batch_size`: 10
549
+ - `per_device_eval_batch_size`: 10
550
+ - `per_gpu_train_batch_size`: None
551
+ - `per_gpu_eval_batch_size`: None
552
+ - `gradient_accumulation_steps`: 1
553
+ - `eval_accumulation_steps`: None
554
+ - `torch_empty_cache_steps`: None
555
+ - `learning_rate`: 5e-05
556
+ - `weight_decay`: 0.0
557
+ - `adam_beta1`: 0.9
558
+ - `adam_beta2`: 0.999
559
+ - `adam_epsilon`: 1e-08
560
+ - `max_grad_norm`: 1
561
+ - `num_train_epochs`: 10
562
+ - `max_steps`: -1
563
+ - `lr_scheduler_type`: linear
564
+ - `lr_scheduler_kwargs`: {}
565
+ - `warmup_ratio`: 0.0
566
+ - `warmup_steps`: 0
567
+ - `log_level`: passive
568
+ - `log_level_replica`: warning
569
+ - `log_on_each_node`: True
570
+ - `logging_nan_inf_filter`: True
571
+ - `save_safetensors`: True
572
+ - `save_on_each_node`: False
573
+ - `save_only_model`: False
574
+ - `restore_callback_states_from_checkpoint`: False
575
+ - `no_cuda`: False
576
+ - `use_cpu`: False
577
+ - `use_mps_device`: False
578
+ - `seed`: 42
579
+ - `data_seed`: None
580
+ - `jit_mode_eval`: False
581
+ - `use_ipex`: False
582
+ - `bf16`: False
583
+ - `fp16`: False
584
+ - `fp16_opt_level`: O1
585
+ - `half_precision_backend`: auto
586
+ - `bf16_full_eval`: False
587
+ - `fp16_full_eval`: False
588
+ - `tf32`: None
589
+ - `local_rank`: 0
590
+ - `ddp_backend`: None
591
+ - `tpu_num_cores`: None
592
+ - `tpu_metrics_debug`: False
593
+ - `debug`: []
594
+ - `dataloader_drop_last`: False
595
+ - `dataloader_num_workers`: 0
596
+ - `dataloader_prefetch_factor`: None
597
+ - `past_index`: -1
598
+ - `disable_tqdm`: False
599
+ - `remove_unused_columns`: True
600
+ - `label_names`: None
601
+ - `load_best_model_at_end`: False
602
+ - `ignore_data_skip`: False
603
+ - `fsdp`: []
604
+ - `fsdp_min_num_params`: 0
605
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
606
+ - `fsdp_transformer_layer_cls_to_wrap`: None
607
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
608
+ - `deepspeed`: None
609
+ - `label_smoothing_factor`: 0.0
610
+ - `optim`: adamw_torch
611
+ - `optim_args`: None
612
+ - `adafactor`: False
613
+ - `group_by_length`: False
614
+ - `length_column_name`: length
615
+ - `ddp_find_unused_parameters`: None
616
+ - `ddp_bucket_cap_mb`: None
617
+ - `ddp_broadcast_buffers`: False
618
+ - `dataloader_pin_memory`: True
619
+ - `dataloader_persistent_workers`: False
620
+ - `skip_memory_metrics`: True
621
+ - `use_legacy_prediction_loop`: False
622
+ - `push_to_hub`: False
623
+ - `resume_from_checkpoint`: None
624
+ - `hub_model_id`: None
625
+ - `hub_strategy`: every_save
626
+ - `hub_private_repo`: None
627
+ - `hub_always_push`: False
628
+ - `gradient_checkpointing`: False
629
+ - `gradient_checkpointing_kwargs`: None
630
+ - `include_inputs_for_metrics`: False
631
+ - `include_for_metrics`: []
632
+ - `eval_do_concat_batches`: True
633
+ - `fp16_backend`: auto
634
+ - `push_to_hub_model_id`: None
635
+ - `push_to_hub_organization`: None
636
+ - `mp_parameters`:
637
+ - `auto_find_batch_size`: False
638
+ - `full_determinism`: False
639
+ - `torchdynamo`: None
640
+ - `ray_scope`: last
641
+ - `ddp_timeout`: 1800
642
+ - `torch_compile`: False
643
+ - `torch_compile_backend`: None
644
+ - `torch_compile_mode`: None
645
+ - `dispatch_batches`: None
646
+ - `split_batches`: None
647
+ - `include_tokens_per_second`: False
648
+ - `include_num_input_tokens_seen`: False
649
+ - `neftune_noise_alpha`: None
650
+ - `optim_target_modules`: None
651
+ - `batch_eval_metrics`: False
652
+ - `eval_on_start`: False
653
+ - `use_liger_kernel`: False
654
+ - `eval_use_gather_object`: False
655
+ - `average_tokens_across_devices`: False
656
+ - `prompts`: None
657
+ - `batch_sampler`: batch_sampler
658
+ - `multi_dataset_batch_sampler`: round_robin
659
+
660
+ </details>
661
+
662
+ ### Training Logs
663
+ | Epoch | Step | cosine_ndcg@10 |
664
+ |:-----:|:----:|:--------------:|
665
+ | 1.0 | 16 | 0.8768 |
666
+ | 2.0 | 32 | 0.9317 |
667
+ | 3.0 | 48 | 0.9484 |
668
+ | 3.125 | 50 | 0.9638 |
669
+ | 4.0 | 64 | 0.9692 |
670
+ | 5.0 | 80 | 0.9692 |
671
+ | 6.0 | 96 | 0.9692 |
672
+ | 6.25 | 100 | 0.9692 |
673
+ | 7.0 | 112 | 0.9692 |
674
+ | 8.0 | 128 | 0.9692 |
675
+ | 9.0 | 144 | 0.9692 |
676
+ | 9.375 | 150 | 0.9692 |
677
+ | 10.0 | 160 | 0.9692 |
678
+
679
+
680
+ ### Framework Versions
681
+ - Python: 3.11.11
682
+ - Sentence Transformers: 3.4.1
683
+ - Transformers: 4.48.3
684
+ - PyTorch: 2.5.1+cu124
685
+ - Accelerate: 1.3.0
686
+ - Datasets: 3.3.1
687
+ - Tokenizers: 0.21.0
688
+
689
+ ## Citation
690
+
691
+ ### BibTeX
692
+
693
+ #### Sentence Transformers
694
+ ```bibtex
695
+ @inproceedings{reimers-2019-sentence-bert,
696
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
697
+ author = "Reimers, Nils and Gurevych, Iryna",
698
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
699
+ month = "11",
700
+ year = "2019",
701
+ publisher = "Association for Computational Linguistics",
702
+ url = "https://arxiv.org/abs/1908.10084",
703
+ }
704
+ ```
705
+
706
+ #### MatryoshkaLoss
707
+ ```bibtex
708
+ @misc{kusupati2024matryoshka,
709
+ title={Matryoshka Representation Learning},
710
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
711
+ year={2024},
712
+ eprint={2205.13147},
713
+ archivePrefix={arXiv},
714
+ primaryClass={cs.LG}
715
+ }
716
+ ```
717
+
718
+ #### MultipleNegativesRankingLoss
719
+ ```bibtex
720
+ @misc{henderson2017efficient,
721
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
722
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
723
+ year={2017},
724
+ eprint={1705.00652},
725
+ archivePrefix={arXiv},
726
+ primaryClass={cs.CL}
727
+ }
728
+ ```
729
+
730
+ <!--
731
+ ## Glossary
732
+
733
+ *Clearly define terms in order to be accessible across audiences.*
734
+ -->
735
+
736
+ <!--
737
+ ## Model Card Authors
738
+
739
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
740
+ -->
741
+
742
+ <!--
743
+ ## Model Card Contact
744
+
745
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
746
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Snowflake/snowflake-arctic-embed-m",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.48.3",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.48.3",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": "cosine"
12
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c17004a91e1043d3a7a226767294cc4bb3cbc4bdd7f96efad30e159011f83e27
3
+ size 435588776
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "pad_to_multiple_of": null,
52
+ "pad_token": "[PAD]",
53
+ "pad_token_type_id": 0,
54
+ "padding_side": "right",
55
+ "sep_token": "[SEP]",
56
+ "stride": 0,
57
+ "strip_accents": null,
58
+ "tokenize_chinese_chars": true,
59
+ "tokenizer_class": "BertTokenizer",
60
+ "truncation_side": "right",
61
+ "truncation_strategy": "longest_first",
62
+ "unk_token": "[UNK]"
63
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff