dabraldeepti25 commited on
Commit
3e54a33
·
verified ·
1 Parent(s): 49201dc

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,720 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:156
8
+ - loss:MatryoshkaLoss
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: Snowflake/snowflake-arctic-embed-l
11
+ widget:
12
+ - source_sentence: How much has the cost of using OpenAI's most expensive model changed
13
+ compared to the previous pricing?
14
+ sentences:
15
+ - Synthetic data as a substantial component of pretraining is becoming increasingly
16
+ common, and the Phi series of models has consistently emphasized the importance
17
+ of synthetic data. Rather than serving as a cheap substitute for organic data,
18
+ synthetic data has several direct advantages over organic data.
19
+ - 'Here’s the rest of the transcript. It’s bland and generic, but my phone can pitch
20
+ bland and generic Christmas movies to Netflix now!
21
+
22
+ LLM prices crashed, thanks to competition and increased efficiency
23
+
24
+ The past twelve months have seen a dramatic collapse in the cost of running a
25
+ prompt through the top tier hosted LLMs.
26
+
27
+ In December 2023 (here’s the Internet Archive for the OpenAI pricing page) OpenAI
28
+ were charging $30/million input tokens for GPT-4, $10/mTok for the then-new GPT-4
29
+ Turbo and $1/mTok for GPT-3.5 Turbo.
30
+
31
+ Today $30/mTok gets you OpenAI’s most expensive model, o1. GPT-4o is $2.50 (12x
32
+ cheaper than GPT-4) and GPT-4o mini is $0.15/mTok—nearly 7x cheaper than GPT-3.5
33
+ and massively more capable.'
34
+ - 'Then there’s the rest. If you browse the Chatbot Arena leaderboard today—still
35
+ the most useful single place to get a vibes-based evaluation of models—you’ll
36
+ see that GPT-4-0314 has fallen to around 70th place. The 18 organizations with
37
+ higher scoring models are Google, OpenAI, Alibaba, Anthropic, Meta, Reka AI, 01
38
+ AI, Amazon, Cohere, DeepSeek, Nvidia, Mistral, NexusFlow, Zhipu AI, xAI, AI21
39
+ Labs, Princeton and Tencent.
40
+
41
+ Training a GPT-4 beating model was a huge deal in 2023. In 2024 it’s an achievement
42
+ that isn’t even particularly notable, though I personally still celebrate any
43
+ time a new organization joins that list.
44
+
45
+ Some of those GPT-4 models run on my laptop'
46
+ - source_sentence: What are some potential consequences of making decisions based
47
+ on hype and misinformation?
48
+ sentences:
49
+ - 'The GPT-4 barrier was comprehensively broken
50
+
51
+ In my December 2023 review I wrote about how We don’t yet know how to build GPT-4—OpenAI’s
52
+ best model was almost a year old at that point, yet no other AI lab had produced
53
+ anything better. What did OpenAI know that the rest of us didn’t?
54
+
55
+ I’m relieved that this has changed completely in the past twelve months. 18 organizations
56
+ now have models on the Chatbot Arena Leaderboard that rank higher than the original
57
+ GPT-4 from March 2023 (GPT-4-0314 on the board)—70 models in total.'
58
+ - 'I like people who are skeptical of this stuff. The hype has been deafening for
59
+ more than two years now, and there are enormous quantities of snake oil and misinformation
60
+ out there. A lot of very bad decisions are being made based on that hype. Being
61
+ critical is a virtue.
62
+
63
+ If we want people with decision-making authority to make good decisions about
64
+ how to apply these tools we first need to acknowledge that there ARE good applications,
65
+ and then help explain how to put those into practice while avoiding the many unintiutive
66
+ traps.
67
+
68
+ (If you still don’t think there are any good applications at all I’m not sure
69
+ why you made it to this point in the article!)'
70
+ - '17th: AI for Data Journalism: demonstrating what we can do with this stuff right
71
+ now
72
+
73
+
74
+ 22nd: Options for accessing Llama 3 from the terminal using LLM
75
+
76
+
77
+
78
+
79
+ May
80
+
81
+
82
+ 8th: Slop is the new name for unwanted AI-generated content
83
+
84
+
85
+ 15th: ChatGPT in “4o” mode is not running the new features yet
86
+
87
+
88
+ 29th: Training is not the same as chatting: ChatGPT and other LLMs don’t remember
89
+ everything you say
90
+
91
+
92
+
93
+
94
+ June
95
+
96
+
97
+ 6th: Accidental prompt injection against RAG applications
98
+
99
+
100
+ 10th: Thoughts on the WWDC 2024 keynote on Apple Intelligence
101
+
102
+
103
+ 17th: Language models on the command-line
104
+
105
+
106
+ 21st: Building search-based RAG using Claude, Datasette and Val Town
107
+
108
+
109
+ 27th: Open challenges for AI engineering
110
+
111
+
112
+
113
+
114
+ July
115
+
116
+
117
+ 14th: Imitation Intelligence, my keynote for PyCon US 2024'
118
+ - source_sentence: What advancements have been made in multimodal vision and audio/video
119
+ capabilities in LLMs?
120
+ sentences:
121
+ - 'The year of slop
122
+
123
+ 2024 was the year that the word "slop" became a term of art. I wrote about this
124
+ in May, expanding on this tweet by @deepfates:'
125
+ - 'The GPT-4 barrier was comprehensively broken
126
+
127
+ Some of those GPT-4 models run on my laptop
128
+
129
+ LLM prices crashed, thanks to competition and increased efficiency
130
+
131
+ Multimodal vision is common, audio and video are starting to emerge
132
+
133
+ Voice and live camera mode are science fiction come to life
134
+
135
+ Prompt driven app generation is a commodity already
136
+
137
+ Universal access to the best models lasted for just a few short months
138
+
139
+ “Agents” still haven’t really happened yet
140
+
141
+ Evals really matter
142
+
143
+ Apple Intelligence is bad, Apple’s MLX library is excellent
144
+
145
+ The rise of inference-scaling “reasoning” models
146
+
147
+ Was the best currently available LLM trained in China for less than $6m?
148
+
149
+ The environmental impact got better
150
+
151
+ The environmental impact got much, much worse'
152
+ - "Posted 31st December 2024 at 6:07 pm · Follow me on Mastodon or Twitter or subscribe\
153
+ \ to my newsletter\n\n\nMore recent articles\n\nLLM 0.22, the annotated release\
154
+ \ notes - 17th February 2025\nRun LLMs on macOS using llm-mlx and Apple's MLX\
155
+ \ framework - 15th February 2025\nURL-addressable Pyodide Python environments\
156
+ \ - 13th February 2025\n\n\n \n\n\nThis is Things we learned about LLMs in 2024\
157
+ \ by Simon Willison, posted on 31st December 2024.\n\nPart of series LLMs annual\
158
+ \ review\n\nStuff we figured out about AI in 2023 - Dec. 31, 2023, 11:59 p.m.\
159
+ \ \nThings we learned about LLMs in 2024 - Dec. 31, 2024, 6:07 p.m. \n\n\n\n \
160
+ \ google\n 347\n\n\n ai\n 1100\n\n\n\
161
+ \ openai\n 257"
162
+ - source_sentence: When did the author first run a large language model on their laptop?
163
+ sentences:
164
+ - '24th: Notes on the new Claude analysis JavaScript code execution tool
165
+
166
+
167
+ 27th: Run a prompt to generate and execute jq programs using llm-jq
168
+
169
+
170
+ 29th: You can now run prompts against images, audio and video in your terminal
171
+ using LLM
172
+
173
+
174
+ 30th: W̶e̶e̶k̶n̶o̶t̶e̶s̶ Monthnotes for October
175
+
176
+
177
+
178
+
179
+ November
180
+
181
+
182
+ 4th: Claude 3.5 Haiku
183
+
184
+
185
+ 7th: Project: VERDAD—tracking misinformation in radio broadcasts using Gemini
186
+ 1.5
187
+
188
+
189
+ 12th: Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac
190
+
191
+
192
+ 19th: Notes from Bing Chat—Our First Encounter With Manipulative AI
193
+
194
+
195
+ 25th: Ask questions of SQLite databases and CSV/JSON files in your terminal
196
+
197
+
198
+
199
+
200
+ December
201
+
202
+
203
+ 4th: First impressions of the new Amazon Nova LLMs (via a new llm-bedrock plugin)
204
+
205
+
206
+ 7th: Prompts.js'
207
+ - '260 input tokens, 92 output tokens. Cost approximately 0.0024 cents (that’s less
208
+ than a 400th of a cent).
209
+
210
+ This increase in efficiency and reduction in price is my single favourite trend
211
+ from 2024. I want the utility of LLMs at a fraction of the energy cost and it
212
+ looks like that’s what we’re getting.
213
+
214
+ Multimodal vision is common, audio and video are starting to emerge
215
+
216
+ My butterfly example above illustrates another key trend from 2024: the rise of
217
+ multi-modal LLMs.
218
+
219
+ A year ago the single most notable example of these was GPT-4 Vision, released
220
+ at OpenAI’s DevDay in November 2023. Google’s multi-modal Gemini 1.0 was announced
221
+ on December 7th 2023 so it also (just) makes it into the 2023 window.'
222
+ - 'My personal laptop is a 64GB M2 MacBook Pro from 2023. It’s a powerful machine,
223
+ but it’s also nearly two years old now—and crucially it’s the same laptop I’ve
224
+ been using ever since I first ran an LLM on my computer back in March 2023 (see
225
+ Large language models are having their Stable Diffusion moment).
226
+
227
+ That same laptop that could just about run a GPT-3-class model in March last year
228
+ has now run multiple GPT-4 class models! Some of my notes on that:'
229
+ - source_sentence: What notable development in LLM technology occurred in the final
230
+ quarter of 2024?
231
+ sentences:
232
+ - 'Now that those features are rolling out they’re pretty weak. As an LLM power-user
233
+ I know what these models are capable of, and Apple’s LLM features offer a pale
234
+ imitation of what a frontier LLM can do. Instead we’re getting notification summaries
235
+ that misrepresent news headlines and writing assistant tools that I’ve not found
236
+ useful at all. Genmoji are kind of fun though.
237
+
238
+ The rise of inference-scaling “reasoning” models
239
+
240
+ The most interesting development in the final quarter of 2024 was the introduction
241
+ of a new shape of LLM, exemplified by OpenAI’s o1 models—initially released as
242
+ o1-preview and o1-mini on September 12th.'
243
+ - 'The year of slop
244
+
245
+ Synthetic training data works great
246
+
247
+ LLMs somehow got even harder to use
248
+
249
+ Knowledge is incredibly unevenly distributed
250
+
251
+ LLMs need better criticism
252
+
253
+ Everything tagged “llms” on my blog in 2024'
254
+ - 'Prompt injection is a natural consequence of this gulibility. I’ve seen precious
255
+ little progress on tackling that problem in 2024, and we’ve been talking about
256
+ it since September 2022.
257
+
258
+ I’m beginning to see the most popular idea of “agents” as dependent on AGI itself.
259
+ A model that’s robust against gulliblity is a very tall order indeed.
260
+
261
+ Evals really matter
262
+
263
+ Anthropic’s Amanda Askell (responsible for much of the work behind Claude’s Character):'
264
+ pipeline_tag: sentence-similarity
265
+ library_name: sentence-transformers
266
+ metrics:
267
+ - cosine_accuracy@1
268
+ - cosine_accuracy@3
269
+ - cosine_accuracy@5
270
+ - cosine_accuracy@10
271
+ - cosine_precision@1
272
+ - cosine_precision@3
273
+ - cosine_precision@5
274
+ - cosine_precision@10
275
+ - cosine_recall@1
276
+ - cosine_recall@3
277
+ - cosine_recall@5
278
+ - cosine_recall@10
279
+ - cosine_ndcg@10
280
+ - cosine_mrr@10
281
+ - cosine_map@100
282
+ model-index:
283
+ - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
284
+ results:
285
+ - task:
286
+ type: information-retrieval
287
+ name: Information Retrieval
288
+ dataset:
289
+ name: Unknown
290
+ type: unknown
291
+ metrics:
292
+ - type: cosine_accuracy@1
293
+ value: 0.8333333333333334
294
+ name: Cosine Accuracy@1
295
+ - type: cosine_accuracy@3
296
+ value: 1.0
297
+ name: Cosine Accuracy@3
298
+ - type: cosine_accuracy@5
299
+ value: 1.0
300
+ name: Cosine Accuracy@5
301
+ - type: cosine_accuracy@10
302
+ value: 1.0
303
+ name: Cosine Accuracy@10
304
+ - type: cosine_precision@1
305
+ value: 0.8333333333333334
306
+ name: Cosine Precision@1
307
+ - type: cosine_precision@3
308
+ value: 0.3333333333333333
309
+ name: Cosine Precision@3
310
+ - type: cosine_precision@5
311
+ value: 0.20000000000000004
312
+ name: Cosine Precision@5
313
+ - type: cosine_precision@10
314
+ value: 0.10000000000000002
315
+ name: Cosine Precision@10
316
+ - type: cosine_recall@1
317
+ value: 0.8333333333333334
318
+ name: Cosine Recall@1
319
+ - type: cosine_recall@3
320
+ value: 1.0
321
+ name: Cosine Recall@3
322
+ - type: cosine_recall@5
323
+ value: 1.0
324
+ name: Cosine Recall@5
325
+ - type: cosine_recall@10
326
+ value: 1.0
327
+ name: Cosine Recall@10
328
+ - type: cosine_ndcg@10
329
+ value: 0.9330328858630988
330
+ name: Cosine Ndcg@10
331
+ - type: cosine_mrr@10
332
+ value: 0.9097222222222222
333
+ name: Cosine Mrr@10
334
+ - type: cosine_map@100
335
+ value: 0.9097222222222222
336
+ name: Cosine Map@100
337
+ ---
338
+
339
+ # SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
340
+
341
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
342
+
343
+ ## Model Details
344
+
345
+ ### Model Description
346
+ - **Model Type:** Sentence Transformer
347
+ - **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b -->
348
+ - **Maximum Sequence Length:** 512 tokens
349
+ - **Output Dimensionality:** 1024 dimensions
350
+ - **Similarity Function:** Cosine Similarity
351
+ <!-- - **Training Dataset:** Unknown -->
352
+ <!-- - **Language:** Unknown -->
353
+ <!-- - **License:** Unknown -->
354
+
355
+ ### Model Sources
356
+
357
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
358
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
359
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
360
+
361
+ ### Full Model Architecture
362
+
363
+ ```
364
+ SentenceTransformer(
365
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
366
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
367
+ (2): Normalize()
368
+ )
369
+ ```
370
+
371
+ ## Usage
372
+
373
+ ### Direct Usage (Sentence Transformers)
374
+
375
+ First install the Sentence Transformers library:
376
+
377
+ ```bash
378
+ pip install -U sentence-transformers
379
+ ```
380
+
381
+ Then you can load this model and run inference.
382
+ ```python
383
+ from sentence_transformers import SentenceTransformer
384
+
385
+ # Download from the 🤗 Hub
386
+ model = SentenceTransformer("dabraldeepti25/legal-ft-v0")
387
+ # Run inference
388
+ sentences = [
389
+ 'What notable development in LLM technology occurred in the final quarter of 2024?',
390
+ 'Now that those features are rolling out they’re pretty weak. As an LLM power-user I know what these models are capable of, and Apple’s LLM features offer a pale imitation of what a frontier LLM can do. Instead we’re getting notification summaries that misrepresent news headlines and writing assistant tools that I’ve not found useful at all. Genmoji are kind of fun though.\nThe rise of inference-scaling “reasoning” models\nThe most interesting development in the final quarter of 2024 was the introduction of a new shape of LLM, exemplified by OpenAI’s o1 models—initially released as o1-preview and o1-mini on September 12th.',
391
+ 'Prompt injection is a natural consequence of this gulibility. I’ve seen precious little progress on tackling that problem in 2024, and we’ve been talking about it since September 2022.\nI’m beginning to see the most popular idea of “agents” as dependent on AGI itself. A model that’s robust against gulliblity is a very tall order indeed.\nEvals really matter\nAnthropic’s Amanda Askell (responsible for much of the work behind Claude’s Character):',
392
+ ]
393
+ embeddings = model.encode(sentences)
394
+ print(embeddings.shape)
395
+ # [3, 1024]
396
+
397
+ # Get the similarity scores for the embeddings
398
+ similarities = model.similarity(embeddings, embeddings)
399
+ print(similarities.shape)
400
+ # [3, 3]
401
+ ```
402
+
403
+ <!--
404
+ ### Direct Usage (Transformers)
405
+
406
+ <details><summary>Click to see the direct usage in Transformers</summary>
407
+
408
+ </details>
409
+ -->
410
+
411
+ <!--
412
+ ### Downstream Usage (Sentence Transformers)
413
+
414
+ You can finetune this model on your own dataset.
415
+
416
+ <details><summary>Click to expand</summary>
417
+
418
+ </details>
419
+ -->
420
+
421
+ <!--
422
+ ### Out-of-Scope Use
423
+
424
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
425
+ -->
426
+
427
+ ## Evaluation
428
+
429
+ ### Metrics
430
+
431
+ #### Information Retrieval
432
+
433
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
434
+
435
+ | Metric | Value |
436
+ |:--------------------|:----------|
437
+ | cosine_accuracy@1 | 0.8333 |
438
+ | cosine_accuracy@3 | 1.0 |
439
+ | cosine_accuracy@5 | 1.0 |
440
+ | cosine_accuracy@10 | 1.0 |
441
+ | cosine_precision@1 | 0.8333 |
442
+ | cosine_precision@3 | 0.3333 |
443
+ | cosine_precision@5 | 0.2 |
444
+ | cosine_precision@10 | 0.1 |
445
+ | cosine_recall@1 | 0.8333 |
446
+ | cosine_recall@3 | 1.0 |
447
+ | cosine_recall@5 | 1.0 |
448
+ | cosine_recall@10 | 1.0 |
449
+ | **cosine_ndcg@10** | **0.933** |
450
+ | cosine_mrr@10 | 0.9097 |
451
+ | cosine_map@100 | 0.9097 |
452
+
453
+ <!--
454
+ ## Bias, Risks and Limitations
455
+
456
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
457
+ -->
458
+
459
+ <!--
460
+ ### Recommendations
461
+
462
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
463
+ -->
464
+
465
+ ## Training Details
466
+
467
+ ### Training Dataset
468
+
469
+ #### Unnamed Dataset
470
+
471
+ * Size: 156 training samples
472
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
473
+ * Approximate statistics based on the first 156 samples:
474
+ | | sentence_0 | sentence_1 |
475
+ |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
476
+ | type | string | string |
477
+ | details | <ul><li>min: 13 tokens</li><li>mean: 20.06 tokens</li><li>max: 33 tokens</li></ul> | <ul><li>min: 43 tokens</li><li>mean: 130.5 tokens</li><li>max: 204 tokens</li></ul> |
478
+ * Samples:
479
+ | sentence_0 | sentence_1 |
480
+ |:----------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
481
+ | <code>What is the significance of Claude Artifacts in the context of LLMs and application development?</code> | <code>We already knew LLMs were spookily good at writing code. If you prompt them right, it turns out they can build you a full interactive application using HTML, CSS and JavaScript (and tools like React if you wire up some extra supporting build mechanisms)—often in a single prompt.<br>Anthropic kicked this idea into high gear when they released Claude Artifacts, a groundbreaking new feature that was initially slightly lost in the noise due to being described half way through their announcement of the incredible Claude 3.5 Sonnet.<br>With Artifacts, Claude can write you an on-demand interactive application and then let you use it directly inside the Claude interface.<br>Here’s my Extract URLs app, entirely generated by Claude:</code> |
482
+ | <code>How does Claude enable users to interact with applications generated by its capabilities?</code> | <code>We already knew LLMs were spookily good at writing code. If you prompt them right, it turns out they can build you a full interactive application using HTML, CSS and JavaScript (and tools like React if you wire up some extra supporting build mechanisms)—often in a single prompt.<br>Anthropic kicked this idea into high gear when they released Claude Artifacts, a groundbreaking new feature that was initially slightly lost in the noise due to being described half way through their announcement of the incredible Claude 3.5 Sonnet.<br>With Artifacts, Claude can write you an on-demand interactive application and then let you use it directly inside the Claude interface.<br>Here’s my Extract URLs app, entirely generated by Claude:</code> |
483
+ | <code>What are some of the new capabilities introduced in multi-modal models that enhance their functionality beyond text?</code> | <code>I think people who complain that LLM improvement has slowed are often missing the enormous advances in these multi-modal models. Being able to run prompts against images (and audio and video) is a fascinating new way to apply these models.<br>Voice and live camera mode are science fiction come to life<br>The audio and live video modes that have started to emerge deserve a special mention.<br>The ability to talk to ChatGPT first arrived in September 2023, but it was mostly an illusion: OpenAI used their excellent Whisper speech-to-text model and a new text-to-speech model (creatively named tts-1) to enable conversations with the ChatGPT mobile apps, but the actual model just saw text.</code> |
484
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
485
+ ```json
486
+ {
487
+ "loss": "MultipleNegativesRankingLoss",
488
+ "matryoshka_dims": [
489
+ 768,
490
+ 512,
491
+ 256,
492
+ 128,
493
+ 64
494
+ ],
495
+ "matryoshka_weights": [
496
+ 1,
497
+ 1,
498
+ 1,
499
+ 1,
500
+ 1
501
+ ],
502
+ "n_dims_per_step": -1
503
+ }
504
+ ```
505
+
506
+ ### Training Hyperparameters
507
+ #### Non-Default Hyperparameters
508
+
509
+ - `eval_strategy`: steps
510
+ - `per_device_train_batch_size`: 10
511
+ - `per_device_eval_batch_size`: 10
512
+ - `num_train_epochs`: 10
513
+ - `multi_dataset_batch_sampler`: round_robin
514
+
515
+ #### All Hyperparameters
516
+ <details><summary>Click to expand</summary>
517
+
518
+ - `overwrite_output_dir`: False
519
+ - `do_predict`: False
520
+ - `eval_strategy`: steps
521
+ - `prediction_loss_only`: True
522
+ - `per_device_train_batch_size`: 10
523
+ - `per_device_eval_batch_size`: 10
524
+ - `per_gpu_train_batch_size`: None
525
+ - `per_gpu_eval_batch_size`: None
526
+ - `gradient_accumulation_steps`: 1
527
+ - `eval_accumulation_steps`: None
528
+ - `torch_empty_cache_steps`: None
529
+ - `learning_rate`: 5e-05
530
+ - `weight_decay`: 0.0
531
+ - `adam_beta1`: 0.9
532
+ - `adam_beta2`: 0.999
533
+ - `adam_epsilon`: 1e-08
534
+ - `max_grad_norm`: 1
535
+ - `num_train_epochs`: 10
536
+ - `max_steps`: -1
537
+ - `lr_scheduler_type`: linear
538
+ - `lr_scheduler_kwargs`: {}
539
+ - `warmup_ratio`: 0.0
540
+ - `warmup_steps`: 0
541
+ - `log_level`: passive
542
+ - `log_level_replica`: warning
543
+ - `log_on_each_node`: True
544
+ - `logging_nan_inf_filter`: True
545
+ - `save_safetensors`: True
546
+ - `save_on_each_node`: False
547
+ - `save_only_model`: False
548
+ - `restore_callback_states_from_checkpoint`: False
549
+ - `no_cuda`: False
550
+ - `use_cpu`: False
551
+ - `use_mps_device`: False
552
+ - `seed`: 42
553
+ - `data_seed`: None
554
+ - `jit_mode_eval`: False
555
+ - `use_ipex`: False
556
+ - `bf16`: False
557
+ - `fp16`: False
558
+ - `fp16_opt_level`: O1
559
+ - `half_precision_backend`: auto
560
+ - `bf16_full_eval`: False
561
+ - `fp16_full_eval`: False
562
+ - `tf32`: None
563
+ - `local_rank`: 0
564
+ - `ddp_backend`: None
565
+ - `tpu_num_cores`: None
566
+ - `tpu_metrics_debug`: False
567
+ - `debug`: []
568
+ - `dataloader_drop_last`: False
569
+ - `dataloader_num_workers`: 0
570
+ - `dataloader_prefetch_factor`: None
571
+ - `past_index`: -1
572
+ - `disable_tqdm`: False
573
+ - `remove_unused_columns`: True
574
+ - `label_names`: None
575
+ - `load_best_model_at_end`: False
576
+ - `ignore_data_skip`: False
577
+ - `fsdp`: []
578
+ - `fsdp_min_num_params`: 0
579
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
580
+ - `fsdp_transformer_layer_cls_to_wrap`: None
581
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
582
+ - `deepspeed`: None
583
+ - `label_smoothing_factor`: 0.0
584
+ - `optim`: adamw_torch
585
+ - `optim_args`: None
586
+ - `adafactor`: False
587
+ - `group_by_length`: False
588
+ - `length_column_name`: length
589
+ - `ddp_find_unused_parameters`: None
590
+ - `ddp_bucket_cap_mb`: None
591
+ - `ddp_broadcast_buffers`: False
592
+ - `dataloader_pin_memory`: True
593
+ - `dataloader_persistent_workers`: False
594
+ - `skip_memory_metrics`: True
595
+ - `use_legacy_prediction_loop`: False
596
+ - `push_to_hub`: False
597
+ - `resume_from_checkpoint`: None
598
+ - `hub_model_id`: None
599
+ - `hub_strategy`: every_save
600
+ - `hub_private_repo`: None
601
+ - `hub_always_push`: False
602
+ - `gradient_checkpointing`: False
603
+ - `gradient_checkpointing_kwargs`: None
604
+ - `include_inputs_for_metrics`: False
605
+ - `include_for_metrics`: []
606
+ - `eval_do_concat_batches`: True
607
+ - `fp16_backend`: auto
608
+ - `push_to_hub_model_id`: None
609
+ - `push_to_hub_organization`: None
610
+ - `mp_parameters`:
611
+ - `auto_find_batch_size`: False
612
+ - `full_determinism`: False
613
+ - `torchdynamo`: None
614
+ - `ray_scope`: last
615
+ - `ddp_timeout`: 1800
616
+ - `torch_compile`: False
617
+ - `torch_compile_backend`: None
618
+ - `torch_compile_mode`: None
619
+ - `dispatch_batches`: None
620
+ - `split_batches`: None
621
+ - `include_tokens_per_second`: False
622
+ - `include_num_input_tokens_seen`: False
623
+ - `neftune_noise_alpha`: None
624
+ - `optim_target_modules`: None
625
+ - `batch_eval_metrics`: False
626
+ - `eval_on_start`: False
627
+ - `use_liger_kernel`: False
628
+ - `eval_use_gather_object`: False
629
+ - `average_tokens_across_devices`: False
630
+ - `prompts`: None
631
+ - `batch_sampler`: batch_sampler
632
+ - `multi_dataset_batch_sampler`: round_robin
633
+
634
+ </details>
635
+
636
+ ### Training Logs
637
+ | Epoch | Step | cosine_ndcg@10 |
638
+ |:-----:|:----:|:--------------:|
639
+ | 1.0 | 16 | 0.9039 |
640
+ | 2.0 | 32 | 0.9010 |
641
+ | 3.0 | 48 | 0.9218 |
642
+ | 3.125 | 50 | 0.9218 |
643
+ | 4.0 | 64 | 0.9218 |
644
+ | 5.0 | 80 | 0.9247 |
645
+ | 6.0 | 96 | 0.9330 |
646
+ | 6.25 | 100 | 0.9330 |
647
+ | 7.0 | 112 | 0.9330 |
648
+ | 8.0 | 128 | 0.9330 |
649
+ | 9.0 | 144 | 0.9330 |
650
+ | 9.375 | 150 | 0.9330 |
651
+ | 10.0 | 160 | 0.9330 |
652
+
653
+
654
+ ### Framework Versions
655
+ - Python: 3.11.11
656
+ - Sentence Transformers: 3.4.1
657
+ - Transformers: 4.48.3
658
+ - PyTorch: 2.5.1+cu124
659
+ - Accelerate: 1.3.0
660
+ - Datasets: 3.3.1
661
+ - Tokenizers: 0.21.0
662
+
663
+ ## Citation
664
+
665
+ ### BibTeX
666
+
667
+ #### Sentence Transformers
668
+ ```bibtex
669
+ @inproceedings{reimers-2019-sentence-bert,
670
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
671
+ author = "Reimers, Nils and Gurevych, Iryna",
672
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
673
+ month = "11",
674
+ year = "2019",
675
+ publisher = "Association for Computational Linguistics",
676
+ url = "https://arxiv.org/abs/1908.10084",
677
+ }
678
+ ```
679
+
680
+ #### MatryoshkaLoss
681
+ ```bibtex
682
+ @misc{kusupati2024matryoshka,
683
+ title={Matryoshka Representation Learning},
684
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
685
+ year={2024},
686
+ eprint={2205.13147},
687
+ archivePrefix={arXiv},
688
+ primaryClass={cs.LG}
689
+ }
690
+ ```
691
+
692
+ #### MultipleNegativesRankingLoss
693
+ ```bibtex
694
+ @misc{henderson2017efficient,
695
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
696
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
697
+ year={2017},
698
+ eprint={1705.00652},
699
+ archivePrefix={arXiv},
700
+ primaryClass={cs.CL}
701
+ }
702
+ ```
703
+
704
+ <!--
705
+ ## Glossary
706
+
707
+ *Clearly define terms in order to be accessible across audiences.*
708
+ -->
709
+
710
+ <!--
711
+ ## Model Card Authors
712
+
713
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
714
+ -->
715
+
716
+ <!--
717
+ ## Model Card Contact
718
+
719
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
720
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Snowflake/snowflake-arctic-embed-l",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 1024,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 4096,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 16,
17
+ "num_hidden_layers": 24,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.48.3",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.48.3",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": "cosine"
12
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5ace2531486c4a6541343de9dd41b2217d6229a61854ecc77b8d1496a4c618c
3
+ size 1336413848
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "pad_to_multiple_of": null,
52
+ "pad_token": "[PAD]",
53
+ "pad_token_type_id": 0,
54
+ "padding_side": "right",
55
+ "sep_token": "[SEP]",
56
+ "stride": 0,
57
+ "strip_accents": null,
58
+ "tokenize_chinese_chars": true,
59
+ "tokenizer_class": "BertTokenizer",
60
+ "truncation_side": "right",
61
+ "truncation_strategy": "longest_first",
62
+ "unk_token": "[UNK]"
63
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff