MikeCraBash commited on
Commit
0e050a7
·
verified ·
1 Parent(s): bd2e07b

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,601 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:400
8
+ - loss:MatryoshkaLoss
9
+ - loss:MultipleNegativesRankingLoss
10
+ widget:
11
+ - source_sentence: Which specific areas of law are mentioned as being unaffected by
12
+ this Regulation?
13
+ sentences:
14
+ - (4)
15
+ - '(45)
16
+
17
+
18
+
19
+ Practices that are prohibited by Union law, including data protection law, non-discrimination
20
+ law, consumer protection law, and competition law, should not be affected by this
21
+ Regulation.
22
+
23
+
24
+
25
+
26
+
27
+
28
+
29
+
30
+
31
+
32
+
33
+
34
+
35
+ (46)'
36
+ - Union harmonisation legislation in an optimal manner. AI systems identified as
37
+ high-risk should be limited to those that have a significant harmful impact on
38
+ the health, safety and fundamental rights of persons in the Union and such limitation
39
+ should minimise any potential restriction to international trade.
40
+ - source_sentence: How does AI contribute to environmentally beneficial outcomes?
41
+ sentences:
42
+ - AI is a fast evolving family of technologies that contributes to a wide array
43
+ of economic, environmental and societal benefits across the entire spectrum of
44
+ industries and social activities. By improving prediction, optimising operations
45
+ and resource allocation, and personalising digital solutions available for individuals
46
+ and organisations, the use of AI can provide key competitive advantages to undertakings
47
+ and support socially and environmentally beneficial outcomes, for example in healthcare,
48
+ agriculture, food safety, education and training, media, sports, culture, infrastructure
49
+ management, energy, transport and logistics, public services, security, justice,
50
+ resource and energy efficiency, environmental monitoring, the conservation
51
+ - To mitigate the risks from high-risk AI systems placed on the market or put into
52
+ service and to ensure a high level of trustworthiness, certain mandatory requirements
53
+ should apply to high-risk AI systems, taking into account the intended purpose
54
+ and the context of use of the AI system and according to the risk-management system
55
+ to be established by the provider. The measures adopted by the providers to comply
56
+ with the mandatory requirements of this Regulation should take into account the
57
+ generally acknowledged state of the art on AI, be proportionate and effective
58
+ to meet the objectives of this Regulation. Based on the New Legislative Framework,
59
+ as clarified in Commission notice ‘The “Blue Guide” on the implementation of EU
60
+ product rules
61
+ - 'Having regard to the proposal from the European Commission,
62
+
63
+
64
+
65
+ After transmission of the draft legislative act to the national parliaments,
66
+
67
+
68
+
69
+ Having regard to the opinion of the European Economic and Social Committee (1),
70
+
71
+
72
+
73
+ Having regard to the opinion of the European Central Bank (2),
74
+
75
+
76
+
77
+ Having regard to the opinion of the Committee of the Regions (3),
78
+
79
+
80
+
81
+ Acting in accordance with the ordinary legislative procedure (4),
82
+
83
+
84
+ Whereas:
85
+
86
+
87
+
88
+
89
+
90
+
91
+
92
+
93
+ (1)'
94
+ - source_sentence: What is the role of the Commission in providing guidance for the
95
+ implementation of conditions for non-high-risk AI systems?
96
+ sentences:
97
+ - of suspects should not be ignored, in particular the difficulty in obtaining meaningful
98
+ information on the functioning of those systems and the resulting difficulty in
99
+ challenging their results in court, in particular by natural persons under investigation.
100
+ - of the conditions referred to above should draw up documentation of the assessment
101
+ before that system is placed on the market or put into service and should provide
102
+ that documentation to national competent authorities upon request. Such a provider
103
+ should be obliged to register the AI system in the EU database established under
104
+ this Regulation. With a view to providing further guidance for the practical implementation
105
+ of the conditions under which the AI systems listed in an annex to this Regulation
106
+ are, on an exceptional basis, non-high-risk, the Commission should, after consulting
107
+ the Board, provide guidelines specifying that practical implementation, completed
108
+ by a comprehensive list of practical examples of use cases of AI systems that
109
+ - completed human activity that may be relevant for the purposes of the high-risk
110
+ uses listed in an annex to this Regulation. Considering those characteristics,
111
+ the AI system provides only an additional layer to a human activity with consequently
112
+ lowered risk. That condition would, for example, apply to AI systems that are
113
+ intended to improve the language used in previously drafted documents, for example
114
+ in relation to professional tone, academic style of language or by aligning text
115
+ to a certain brand messaging. The third condition should be that the AI system
116
+ is intended to detect decision-making patterns or deviations from prior decision-making
117
+ patterns. The risk would be lowered because the use of the AI system follows a previously
118
+ - source_sentence: How does the context surrounding the number 39 influence its interpretation?
119
+ sentences:
120
+ - (39)
121
+ - requested by the European Parliament (6).
122
+ - under the UN Convention relating to the Status of Refugees done at Geneva on 28 July
123
+ 1951 as amended by the Protocol of 31 January 1967. Nor should they be used to
124
+ in any way infringe on the principle of non-refoulement, or to deny safe and effective
125
+ legal avenues into the territory of the Union, including the right to international
126
+ protection.
127
+ - source_sentence: How does the number 63 relate to the overall theme or subject being
128
+ discussed?
129
+ sentences:
130
+ - (60)
131
+ - (63)
132
+ - The deployment of AI systems in education is important to promote high-quality
133
+ digital education and training and to allow all learners and teachers to acquire
134
+ and share the necessary digital skills and competences, including media literacy,
135
+ and critical thinking, to take an active part in the economy, society, and in
136
+ democratic processes. However, AI systems used in education or vocational training,
137
+ in particular for determining access or admission, for assigning persons to educational
138
+ and vocational training institutions or programmes at all levels, for evaluating
139
+ learning outcomes of persons, for assessing the appropriate level of education
140
+ for an individual and materially influencing the level of education and training
141
+ that individuals
142
+ pipeline_tag: sentence-similarity
143
+ library_name: sentence-transformers
144
+ metrics:
145
+ - cosine_accuracy@1
146
+ - cosine_accuracy@3
147
+ - cosine_accuracy@5
148
+ - cosine_accuracy@10
149
+ - cosine_precision@1
150
+ - cosine_precision@3
151
+ - cosine_precision@5
152
+ - cosine_precision@10
153
+ - cosine_recall@1
154
+ - cosine_recall@3
155
+ - cosine_recall@5
156
+ - cosine_recall@10
157
+ - cosine_ndcg@10
158
+ - cosine_mrr@10
159
+ - cosine_map@100
160
+ model-index:
161
+ - name: SentenceTransformer
162
+ results:
163
+ - task:
164
+ type: information-retrieval
165
+ name: Information Retrieval
166
+ dataset:
167
+ name: Unknown
168
+ type: unknown
169
+ metrics:
170
+ - type: cosine_accuracy@1
171
+ value: 0.9583333333333334
172
+ name: Cosine Accuracy@1
173
+ - type: cosine_accuracy@3
174
+ value: 1.0
175
+ name: Cosine Accuracy@3
176
+ - type: cosine_accuracy@5
177
+ value: 1.0
178
+ name: Cosine Accuracy@5
179
+ - type: cosine_accuracy@10
180
+ value: 1.0
181
+ name: Cosine Accuracy@10
182
+ - type: cosine_precision@1
183
+ value: 0.9583333333333334
184
+ name: Cosine Precision@1
185
+ - type: cosine_precision@3
186
+ value: 0.3333333333333333
187
+ name: Cosine Precision@3
188
+ - type: cosine_precision@5
189
+ value: 0.19999999999999998
190
+ name: Cosine Precision@5
191
+ - type: cosine_precision@10
192
+ value: 0.09999999999999999
193
+ name: Cosine Precision@10
194
+ - type: cosine_recall@1
195
+ value: 0.9583333333333334
196
+ name: Cosine Recall@1
197
+ - type: cosine_recall@3
198
+ value: 1.0
199
+ name: Cosine Recall@3
200
+ - type: cosine_recall@5
201
+ value: 1.0
202
+ name: Cosine Recall@5
203
+ - type: cosine_recall@10
204
+ value: 1.0
205
+ name: Cosine Recall@10
206
+ - type: cosine_ndcg@10
207
+ value: 0.9791666666666666
208
+ name: Cosine Ndcg@10
209
+ - type: cosine_mrr@10
210
+ value: 0.9722222222222223
211
+ name: Cosine Mrr@10
212
+ - type: cosine_map@100
213
+ value: 0.9722222222222222
214
+ name: Cosine Map@100
215
+ ---
216
+
217
+ # SentenceTransformer
218
+
219
+ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
220
+
221
+ ## Model Details
222
+
223
+ ### Model Description
224
+ - **Model Type:** Sentence Transformer
225
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
226
+ - **Maximum Sequence Length:** 512 tokens
227
+ - **Output Dimensionality:** 1024 dimensions
228
+ - **Similarity Function:** Cosine Similarity
229
+ <!-- - **Training Dataset:** Unknown -->
230
+ <!-- - **Language:** Unknown -->
231
+ <!-- - **License:** Unknown -->
232
+
233
+ ### Model Sources
234
+
235
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
236
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
237
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
238
+
239
+ ### Full Model Architecture
240
+
241
+ ```
242
+ SentenceTransformer(
243
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
244
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
245
+ (2): Normalize()
246
+ )
247
+ ```
248
+
249
+ ## Usage
250
+
251
+ ### Direct Usage (Sentence Transformers)
252
+
253
+ First install the Sentence Transformers library:
254
+
255
+ ```bash
256
+ pip install -U sentence-transformers
257
+ ```
258
+
259
+ Then you can load this model and run inference.
260
+ ```python
261
+ from sentence_transformers import SentenceTransformer
262
+
263
+ # Download from the 🤗 Hub
264
+ model = SentenceTransformer("MikeCraBash/legal-ft-1")
265
+ # Run inference
266
+ sentences = [
267
+ 'How does the number 63 relate to the overall theme or subject being discussed?',
268
+ '(63)',
269
+ '(60)',
270
+ ]
271
+ embeddings = model.encode(sentences)
272
+ print(embeddings.shape)
273
+ # [3, 1024]
274
+
275
+ # Get the similarity scores for the embeddings
276
+ similarities = model.similarity(embeddings, embeddings)
277
+ print(similarities.shape)
278
+ # [3, 3]
279
+ ```
280
+
281
+ <!--
282
+ ### Direct Usage (Transformers)
283
+
284
+ <details><summary>Click to see the direct usage in Transformers</summary>
285
+
286
+ </details>
287
+ -->
288
+
289
+ <!--
290
+ ### Downstream Usage (Sentence Transformers)
291
+
292
+ You can finetune this model on your own dataset.
293
+
294
+ <details><summary>Click to expand</summary>
295
+
296
+ </details>
297
+ -->
298
+
299
+ <!--
300
+ ### Out-of-Scope Use
301
+
302
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
303
+ -->
304
+
305
+ ## Evaluation
306
+
307
+ ### Metrics
308
+
309
+ #### Information Retrieval
310
+
311
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
312
+
313
+ | Metric | Value |
314
+ |:--------------------|:-----------|
315
+ | cosine_accuracy@1 | 0.9583 |
316
+ | cosine_accuracy@3 | 1.0 |
317
+ | cosine_accuracy@5 | 1.0 |
318
+ | cosine_accuracy@10 | 1.0 |
319
+ | cosine_precision@1 | 0.9583 |
320
+ | cosine_precision@3 | 0.3333 |
321
+ | cosine_precision@5 | 0.2 |
322
+ | cosine_precision@10 | 0.1 |
323
+ | cosine_recall@1 | 0.9583 |
324
+ | cosine_recall@3 | 1.0 |
325
+ | cosine_recall@5 | 1.0 |
326
+ | cosine_recall@10 | 1.0 |
327
+ | **cosine_ndcg@10** | **0.9792** |
328
+ | cosine_mrr@10 | 0.9722 |
329
+ | cosine_map@100 | 0.9722 |
330
+
331
+ <!--
332
+ ## Bias, Risks and Limitations
333
+
334
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
335
+ -->
336
+
337
+ <!--
338
+ ### Recommendations
339
+
340
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
341
+ -->
342
+
343
+ ## Training Details
344
+
345
+ ### Training Dataset
346
+
347
+ #### Unnamed Dataset
348
+
349
+ * Size: 400 training samples
350
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
351
+ * Approximate statistics based on the first 400 samples:
352
+ | | sentence_0 | sentence_1 |
353
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
354
+ | type | string | string |
355
+ | details | <ul><li>min: 10 tokens</li><li>mean: 20.45 tokens</li><li>max: 35 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 93.01 tokens</li><li>max: 186 tokens</li></ul> |
356
+ * Samples:
357
+ | sentence_0 | sentence_1 |
358
+ |:-------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
359
+ | <code>What types of risk analytics are permitted according to the context provided?</code> | <code>solely on profiling them or on assessing their personality traits and characteristics should be prohibited. In any case, that prohibition does not refer to or touch upon risk analytics that are not based on the profiling of individuals or on the personality traits and characteristics of individuals, such as AI systems using risk analytics to assess the likelihood of financial fraud by undertakings on the basis of suspicious transactions or risk analytic tools to predict the likelihood of the localisation of narcotics or illicit goods by customs authorities, for example on the basis of known trafficking routes.</code> |
360
+ | <code>Why is profiling individuals based on their personality traits prohibited?</code> | <code>solely on profiling them or on assessing their personality traits and characteristics should be prohibited. In any case, that prohibition does not refer to or touch upon risk analytics that are not based on the profiling of individuals or on the personality traits and characteristics of individuals, such as AI systems using risk analytics to assess the likelihood of financial fraud by undertakings on the basis of suspicious transactions or risk analytic tools to predict the likelihood of the localisation of narcotics or illicit goods by customs authorities, for example on the basis of known trafficking routes.</code> |
361
+ | <code>What criteria determine whether an AI system is classified as high-risk?</code> | <code>of AI systems that are high-risk and use cases that are not.</code> |
362
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
363
+ ```json
364
+ {
365
+ "loss": "MultipleNegativesRankingLoss",
366
+ "matryoshka_dims": [
367
+ 768,
368
+ 512,
369
+ 256,
370
+ 128,
371
+ 64
372
+ ],
373
+ "matryoshka_weights": [
374
+ 1,
375
+ 1,
376
+ 1,
377
+ 1,
378
+ 1
379
+ ],
380
+ "n_dims_per_step": -1
381
+ }
382
+ ```
383
+
384
+ ### Training Hyperparameters
385
+ #### Non-Default Hyperparameters
386
+
387
+ - `eval_strategy`: steps
388
+ - `per_device_train_batch_size`: 10
389
+ - `per_device_eval_batch_size`: 10
390
+ - `num_train_epochs`: 10
391
+ - `multi_dataset_batch_sampler`: round_robin
392
+
393
+ #### All Hyperparameters
394
+ <details><summary>Click to expand</summary>
395
+
396
+ - `overwrite_output_dir`: False
397
+ - `do_predict`: False
398
+ - `eval_strategy`: steps
399
+ - `prediction_loss_only`: True
400
+ - `per_device_train_batch_size`: 10
401
+ - `per_device_eval_batch_size`: 10
402
+ - `per_gpu_train_batch_size`: None
403
+ - `per_gpu_eval_batch_size`: None
404
+ - `gradient_accumulation_steps`: 1
405
+ - `eval_accumulation_steps`: None
406
+ - `torch_empty_cache_steps`: None
407
+ - `learning_rate`: 5e-05
408
+ - `weight_decay`: 0.0
409
+ - `adam_beta1`: 0.9
410
+ - `adam_beta2`: 0.999
411
+ - `adam_epsilon`: 1e-08
412
+ - `max_grad_norm`: 1
413
+ - `num_train_epochs`: 10
414
+ - `max_steps`: -1
415
+ - `lr_scheduler_type`: linear
416
+ - `lr_scheduler_kwargs`: {}
417
+ - `warmup_ratio`: 0.0
418
+ - `warmup_steps`: 0
419
+ - `log_level`: passive
420
+ - `log_level_replica`: warning
421
+ - `log_on_each_node`: True
422
+ - `logging_nan_inf_filter`: True
423
+ - `save_safetensors`: True
424
+ - `save_on_each_node`: False
425
+ - `save_only_model`: False
426
+ - `restore_callback_states_from_checkpoint`: False
427
+ - `no_cuda`: False
428
+ - `use_cpu`: False
429
+ - `use_mps_device`: False
430
+ - `seed`: 42
431
+ - `data_seed`: None
432
+ - `jit_mode_eval`: False
433
+ - `use_ipex`: False
434
+ - `bf16`: False
435
+ - `fp16`: False
436
+ - `fp16_opt_level`: O1
437
+ - `half_precision_backend`: auto
438
+ - `bf16_full_eval`: False
439
+ - `fp16_full_eval`: False
440
+ - `tf32`: None
441
+ - `local_rank`: 0
442
+ - `ddp_backend`: None
443
+ - `tpu_num_cores`: None
444
+ - `tpu_metrics_debug`: False
445
+ - `debug`: []
446
+ - `dataloader_drop_last`: False
447
+ - `dataloader_num_workers`: 0
448
+ - `dataloader_prefetch_factor`: None
449
+ - `past_index`: -1
450
+ - `disable_tqdm`: False
451
+ - `remove_unused_columns`: True
452
+ - `label_names`: None
453
+ - `load_best_model_at_end`: False
454
+ - `ignore_data_skip`: False
455
+ - `fsdp`: []
456
+ - `fsdp_min_num_params`: 0
457
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
458
+ - `fsdp_transformer_layer_cls_to_wrap`: None
459
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
460
+ - `deepspeed`: None
461
+ - `label_smoothing_factor`: 0.0
462
+ - `optim`: adamw_torch
463
+ - `optim_args`: None
464
+ - `adafactor`: False
465
+ - `group_by_length`: False
466
+ - `length_column_name`: length
467
+ - `ddp_find_unused_parameters`: None
468
+ - `ddp_bucket_cap_mb`: None
469
+ - `ddp_broadcast_buffers`: False
470
+ - `dataloader_pin_memory`: True
471
+ - `dataloader_persistent_workers`: False
472
+ - `skip_memory_metrics`: True
473
+ - `use_legacy_prediction_loop`: False
474
+ - `push_to_hub`: False
475
+ - `resume_from_checkpoint`: None
476
+ - `hub_model_id`: None
477
+ - `hub_strategy`: every_save
478
+ - `hub_private_repo`: None
479
+ - `hub_always_push`: False
480
+ - `gradient_checkpointing`: False
481
+ - `gradient_checkpointing_kwargs`: None
482
+ - `include_inputs_for_metrics`: False
483
+ - `include_for_metrics`: []
484
+ - `eval_do_concat_batches`: True
485
+ - `fp16_backend`: auto
486
+ - `push_to_hub_model_id`: None
487
+ - `push_to_hub_organization`: None
488
+ - `mp_parameters`:
489
+ - `auto_find_batch_size`: False
490
+ - `full_determinism`: False
491
+ - `torchdynamo`: None
492
+ - `ray_scope`: last
493
+ - `ddp_timeout`: 1800
494
+ - `torch_compile`: False
495
+ - `torch_compile_backend`: None
496
+ - `torch_compile_mode`: None
497
+ - `dispatch_batches`: None
498
+ - `split_batches`: None
499
+ - `include_tokens_per_second`: False
500
+ - `include_num_input_tokens_seen`: False
501
+ - `neftune_noise_alpha`: None
502
+ - `optim_target_modules`: None
503
+ - `batch_eval_metrics`: False
504
+ - `eval_on_start`: False
505
+ - `use_liger_kernel`: False
506
+ - `eval_use_gather_object`: False
507
+ - `average_tokens_across_devices`: False
508
+ - `prompts`: None
509
+ - `batch_sampler`: batch_sampler
510
+ - `multi_dataset_batch_sampler`: round_robin
511
+
512
+ </details>
513
+
514
+ ### Training Logs
515
+ | Epoch | Step | cosine_ndcg@10 |
516
+ |:-----:|:----:|:--------------:|
517
+ | 1.0 | 40 | 0.9715 |
518
+ | 1.25 | 50 | 0.9792 |
519
+ | 2.0 | 80 | 0.9715 |
520
+ | 2.5 | 100 | 0.9715 |
521
+ | 3.0 | 120 | 0.9715 |
522
+ | 3.75 | 150 | 0.9715 |
523
+ | 4.0 | 160 | 0.9792 |
524
+ | 5.0 | 200 | 0.9792 |
525
+ | 6.0 | 240 | 0.9688 |
526
+ | 6.25 | 250 | 0.9792 |
527
+ | 7.0 | 280 | 0.9715 |
528
+ | 7.5 | 300 | 0.9792 |
529
+ | 8.0 | 320 | 0.9792 |
530
+ | 8.75 | 350 | 0.9792 |
531
+ | 9.0 | 360 | 0.9792 |
532
+ | 10.0 | 400 | 0.9792 |
533
+
534
+
535
+ ### Framework Versions
536
+ - Python: 3.11.11
537
+ - Sentence Transformers: 3.4.1
538
+ - Transformers: 4.48.2
539
+ - PyTorch: 2.5.1+cu124
540
+ - Accelerate: 1.3.0
541
+ - Datasets: 3.2.0
542
+ - Tokenizers: 0.21.0
543
+
544
+ ## Citation
545
+
546
+ ### BibTeX
547
+
548
+ #### Sentence Transformers
549
+ ```bibtex
550
+ @inproceedings{reimers-2019-sentence-bert,
551
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
552
+ author = "Reimers, Nils and Gurevych, Iryna",
553
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
554
+ month = "11",
555
+ year = "2019",
556
+ publisher = "Association for Computational Linguistics",
557
+ url = "https://arxiv.org/abs/1908.10084",
558
+ }
559
+ ```
560
+
561
+ #### MatryoshkaLoss
562
+ ```bibtex
563
+ @misc{kusupati2024matryoshka,
564
+ title={Matryoshka Representation Learning},
565
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
566
+ year={2024},
567
+ eprint={2205.13147},
568
+ archivePrefix={arXiv},
569
+ primaryClass={cs.LG}
570
+ }
571
+ ```
572
+
573
+ #### MultipleNegativesRankingLoss
574
+ ```bibtex
575
+ @misc{henderson2017efficient,
576
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
577
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
578
+ year={2017},
579
+ eprint={1705.00652},
580
+ archivePrefix={arXiv},
581
+ primaryClass={cs.CL}
582
+ }
583
+ ```
584
+
585
+ <!--
586
+ ## Glossary
587
+
588
+ *Clearly define terms in order to be accessible across audiences.*
589
+ -->
590
+
591
+ <!--
592
+ ## Model Card Authors
593
+
594
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
595
+ -->
596
+
597
+ <!--
598
+ ## Model Card Contact
599
+
600
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
601
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Snowflake/snowflake-arctic-embed-l",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 1024,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 4096,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 16,
17
+ "num_hidden_layers": 24,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.48.2",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.48.2",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": "cosine"
12
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e28956b8ea7d3fabd9c70344665972aa93b203c607ccf289186bc03311880e1
3
+ size 1336413848
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "pad_to_multiple_of": null,
52
+ "pad_token": "[PAD]",
53
+ "pad_token_type_id": 0,
54
+ "padding_side": "right",
55
+ "sep_token": "[SEP]",
56
+ "stride": 0,
57
+ "strip_accents": null,
58
+ "tokenize_chinese_chars": true,
59
+ "tokenizer_class": "BertTokenizer",
60
+ "truncation_side": "right",
61
+ "truncation_strategy": "longest_first",
62
+ "unk_token": "[UNK]"
63
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff