BlackBeenie commited on
Commit
a74012a
·
verified ·
1 Parent(s): 5186cfe

Add new SentenceTransformer model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,583 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:498970
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: BAAI/bge-m3
10
+ widget:
11
+ - source_sentence: how long does engine take to cool down
12
+ sentences:
13
+ - "Turn off the engine. If you can pop the hood from the driverâ\x80\x99s seat,\
14
+ \ do so â\x80\x94 but donâ\x80\x99t risk opening it by hand until the engine has\
15
+ \ cooled, especially if you see steam wafting off the engine. It typically takes\
16
+ \ a solid 30 minutes for an engine to cool down enough for it to be safe to handle."
17
+ - Realism as an art movement was led by Gustave Courbet in France. It spread across
18
+ Europe and was influential for the rest of the century and beyond, but as it became
19
+ adopted into the mainstream of painting it becomes less common and useful as a
20
+ term to define artistic style.
21
+ - As the engine cools when shut off, the contracting coolant in the radiator sucks
22
+ back coolant from the recovery tank. Fluid in the recovery tank should never be
23
+ below the full hot or full cold marks, lest air be sucked in. -- BETTER ANSWER
24
+ ==. Your cooling fans are not turning on.It is not normal for your overflow tank
25
+ to boil like that. It is true that your radiator is overflowing into the reserve
26
+ tank, but that means yourr adiaotor is boiling. Check for blown fuses or relays
27
+ for your cooling fans.IF theya re fine. run your engine for about 15 minutes and
28
+ drive.s the engine cools when shut off, the contracting coolant in the radiator
29
+ sucks back coolant from the recovery tank. Fluid in the recovery tank should never
30
+ be below the full hot or full cold marks, lest air be sucked in. -- BETTER ANSWER
31
+ ==. Your cooling fans are not turning on.
32
+ - source_sentence: what is the tax rate for food in missouri
33
+ sentences:
34
+ - 'MINIMUM WAGE RATE Missouri: $7.65. All employers must pay the minimum wage, except
35
+ employers engaged in retail or service businesses whose annual gross income is
36
+ less than $500,000. There are also certain classes of employees exempt.Missouri''s
37
+ minimum wage is recalculated yearly based on the Consumer Price Index. St. Louis
38
+ restaurant owner isn''t waiting for Congress or the Missouri Legislature to raise
39
+ the minimum wage to boost his own workers'' pay. Pi Pizzeria owner Chris Sommers
40
+ says he will pay all of his employees $10.10 an hour starting on April 1, instead
41
+ of the Missouri state''s minimum of $7.50.'
42
+ - Rating Newest Oldest. 1 MO probably has the most insanely complex sales tax in
43
+ the country. Not only is there a state level tax (4.225% for most items and 1.225%
44
+ for grocery foods) but city and county level sales taxes. 2 The sales tax is
45
+ set by county. Go to Missouri Sales Tax website and look up your county.
46
+ - 'Arthritis: The health benefits of copper relate to its anti-inflammatory actions
47
+ that assist in reducing the symptoms of arthritis. The consumer market is also
48
+ flooded with copper bracelets as well as other accessories for curing this condition.'
49
+ - source_sentence: is woolwich london safe
50
+ sentences:
51
+ - SE18 has four train stations Plumstead, Woolwich Arsenal and Woolwich Dockyard.
52
+ Plumstead and Woolwich Arsenal are situated in Zone 4, Woolwich Dockyard in Zone
53
+ 3.Approximately just under 30 minutes to Charing Cross from all Stations.Trains
54
+ are operated buy South-eastern. Train timetables are available at southeasternrailway.co.uk.here
55
+ is no shortage of schools, libraries and colleges in SE18. A short walk from Plumstead
56
+ station is Greenwich Community College offering a wide range of courses from cookery
57
+ to languages. Notable schools include the newly re-built Foxfield Primary, Saint
58
+ Pauls and Plumstead Mannor.
59
+ - Karine Adria, Gamer. Views. If you mean will Mount Everest ever erupt lava like
60
+ a volcano then no. It is a mountain part of the Himalayan range which is an orogenic
61
+ mountain belt formed as a result of a continental collision along the convergent
62
+ plate boundary between the Indo-Australian Plate and the Eurasian Plate. The process
63
+ of how volcanoes are formed is totally different.
64
+ - "In its heyday Woolwich was known better known as the home of Arsenal Football\
65
+ \ Club, the first McDonalds in the UK and the base for the British Armyâ\x80\x99\
66
+ s artillery. At present, it is safe to say the town would not be found in any\
67
+ \ London travel guide."
68
+ - source_sentence: what ocean is around hawaii
69
+ sentences:
70
+ - 'Hawaii: Aloha! Whether you are hoping to travel to Hawaii for a tropical green
71
+ Christmas or you are hoping to make this island paradise your home, we can help
72
+ you find the information you need! The state of Hawaii, located in the middle
73
+ of the Pacific Ocean, is farther away from any other landmass than any other island
74
+ on the earth.'
75
+ - Deadband is also known as hysteresis and is the amount of a measured variable
76
+ (pressure, temperature, etc.) between the point where a switch closes and then
77
+ re-opens. It can be implemented is software if you have an analog input (i.e.,
78
+ open at 1.02 PSI, close at 0.98 PSI) or designed into a switching mechanism. Often
79
+ the deadband of a switch is fixed and cannot be adjusted. A typical example is
80
+ in a wall thermostat in your house.
81
+ - Under the Sea, On the Stage. Enter a world where the ocean floor is full of fish
82
+ and Ariel longs to stroll and dance with humans. Read More. Enter a world where
83
+ the ocean floor is full of fish and Ariel longs to stroll and dance with humans.
84
+ - source_sentence: who is christopher kyle
85
+ sentences:
86
+ - '''American Sniper'' Chris Kyle''s wife thanks audiences for ''watching the hard
87
+ stuff''. Taya Kyle has told of her gratitude to audiences for supporting the film
88
+ about her dead husband Chris Kyle, a Navy Seal played by Bradley Cooper.'
89
+ - Chris Kyle American Sniper. Christopher Scott Kyle was born and raised in Texas
90
+ and was a United States Navy SEAL from 1999 to 2009. He is currently known as
91
+ the most successful sniper in American military history. According to his book
92
+ American Sniper, he had 160 confirmed kills (which was from 255 claimed kills).
93
+ - A passport card is valid for travel to and from Canada, Mexico, the Caribbean
94
+ and Bermuda at land border crossings and sea ports-of-entry. It is not valid for
95
+ air travel. It is valid for 10 years for adults and 5 years for minors under 16.
96
+ A first passport book costs $135 for adults and $105 for minors under the age
97
+ of 16. It costs $110 to renew. A first passport card costs $55 for adults and
98
+ $40 for minors under the age of 16. It costs $30 to renew. The cost when applying
99
+ for both is $165 for adults and $120 for minors.
100
+ pipeline_tag: sentence-similarity
101
+ library_name: sentence-transformers
102
+ ---
103
+
104
+ # SentenceTransformer based on BAAI/bge-m3
105
+
106
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
107
+
108
+ ## Model Details
109
+
110
+ ### Model Description
111
+ - **Model Type:** Sentence Transformer
112
+ - **Base model:** [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) <!-- at revision 5617a9f61b028005a4858fdac845db406aefb181 -->
113
+ - **Maximum Sequence Length:** 8192 tokens
114
+ - **Output Dimensionality:** 1024 dimensions
115
+ - **Similarity Function:** Cosine Similarity
116
+ <!-- - **Training Dataset:** Unknown -->
117
+ <!-- - **Language:** Unknown -->
118
+ <!-- - **License:** Unknown -->
119
+
120
+ ### Model Sources
121
+
122
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
123
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
124
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
125
+
126
+ ### Full Model Architecture
127
+
128
+ ```
129
+ SentenceTransformer(
130
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
131
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
132
+ )
133
+ ```
134
+
135
+ ## Usage
136
+
137
+ ### Direct Usage (Sentence Transformers)
138
+
139
+ First install the Sentence Transformers library:
140
+
141
+ ```bash
142
+ pip install -U sentence-transformers
143
+ ```
144
+
145
+ Then you can load this model and run inference.
146
+ ```python
147
+ from sentence_transformers import SentenceTransformer
148
+
149
+ # Download from the 🤗 Hub
150
+ model = SentenceTransformer("BlackBeenie/bge-m3-msmarco-v3-sbert")
151
+ # Run inference
152
+ sentences = [
153
+ 'who is christopher kyle',
154
+ 'Chris Kyle American Sniper. Christopher Scott Kyle was born and raised in Texas and was a United States Navy SEAL from 1999 to 2009. He is currently known as the most successful sniper in American military history. According to his book American Sniper, he had 160 confirmed kills (which was from 255 claimed kills).',
155
+ "'American Sniper' Chris Kyle's wife thanks audiences for 'watching the hard stuff'. Taya Kyle has told of her gratitude to audiences for supporting the film about her dead husband Chris Kyle, a Navy Seal played by Bradley Cooper.",
156
+ ]
157
+ embeddings = model.encode(sentences)
158
+ print(embeddings.shape)
159
+ # [3, 1024]
160
+
161
+ # Get the similarity scores for the embeddings
162
+ similarities = model.similarity(embeddings, embeddings)
163
+ print(similarities.shape)
164
+ # [3, 3]
165
+ ```
166
+
167
+ <!--
168
+ ### Direct Usage (Transformers)
169
+
170
+ <details><summary>Click to see the direct usage in Transformers</summary>
171
+
172
+ </details>
173
+ -->
174
+
175
+ <!--
176
+ ### Downstream Usage (Sentence Transformers)
177
+
178
+ You can finetune this model on your own dataset.
179
+
180
+ <details><summary>Click to expand</summary>
181
+
182
+ </details>
183
+ -->
184
+
185
+ <!--
186
+ ### Out-of-Scope Use
187
+
188
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
189
+ -->
190
+
191
+ <!--
192
+ ## Bias, Risks and Limitations
193
+
194
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
195
+ -->
196
+
197
+ <!--
198
+ ### Recommendations
199
+
200
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
201
+ -->
202
+
203
+ ## Training Details
204
+
205
+ ### Training Dataset
206
+
207
+ #### Unnamed Dataset
208
+
209
+ * Size: 498,970 training samples
210
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>sentence_2</code>
211
+ * Approximate statistics based on the first 1000 samples:
212
+ | | sentence_0 | sentence_1 | sentence_2 |
213
+ |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
214
+ | type | string | string | string |
215
+ | details | <ul><li>min: 4 tokens</li><li>mean: 9.93 tokens</li><li>max: 37 tokens</li></ul> | <ul><li>min: 17 tokens</li><li>mean: 90.01 tokens</li><li>max: 239 tokens</li></ul> | <ul><li>min: 16 tokens</li><li>mean: 86.47 tokens</li><li>max: 229 tokens</li></ul> |
216
+ * Samples:
217
+ | sentence_0 | sentence_1 | sentence_2 |
218
+ |:-------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
219
+ | <code>how much does it cost to paint a interior house</code> | <code>Interior House Painting Cost Factors. Generally, it will take a minimum of two gallons of paint to cover a room. At the highest end, paint will cost anywhere between $30 and $60 per gallon and come in three different finishes: flat, semi-gloss or high-gloss.Flat finishes are the least shiny and are best suited for areas requiring frequent cleaning.rovide a few details about your project and receive competitive quotes from local pros. The average national cost to paint a home interior is $1,671, with most homeowners spending between $966 and $2,426.</code> | <code>Question DetailsAsked on 3/12/2014. Guest_... How much does it cost per square foot to paint the interior of a house? We just bought roughly a 1500 sg ft townhouse and want to get the entire house, including ceilings painted (including a roughly 400 sq ft finished basement not included in square footage).</code> |
220
+ | <code>when is s corp taxes due</code> | <code>If you form a corporate entity for your small business, regardless of whether it's taxed as a C or S corporation, a tax return must be filed with the Internal Revenue Service on its due date each year. Corporate tax returns are always due on the 15th day of the third month following the close of the tax year. The actual day that the tax return filing deadline falls on, however, isn't the same for every corporation.</code> | <code>But if you haven’t, don’t panic: the majority of forms aren’t due quite yet. Most tax forms have an annual January 31 due date. Your tax forms are considered on time if the form is properly addressed and mailed on or before that date. If the regular due date falls on a Saturday, Sunday, or legal holiday – which is the case in 2015 for both January and February due dates – issuers have until the next business day.</code> |
221
+ | <code>what are disaccharides</code> | <code>Disaccharides are formed when two monosaccharides are joined together and a molecule of water is removed, a process known as dehydration reaction. For example; milk sugar (lactose) is made from glucose and galactose whereas the sugar from sugar cane and sugar beets (sucrose) is made from glucose and fructose.altose, another notable disaccharide, is made up of two glucose molecules. The two monosaccharides are bonded via a dehydration reaction (also called a condensation reaction or dehydration synthesis) that leads to the loss of a molecule of water and formation of a glycosidic bond.</code> | <code>Other disaccharides include (diagrams p. 364): Sucrose, common table sugar, has a glycosidic bond linking the anomeric hydroxyls of glucose and fructose. Because the configuration at the anomeric carbon of glucose is a (O points down from the ring), the linkage is designated a(12).</code> |
222
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
223
+ ```json
224
+ {
225
+ "scale": 20.0,
226
+ "similarity_fct": "cos_sim"
227
+ }
228
+ ```
229
+
230
+ ### Training Hyperparameters
231
+ #### Non-Default Hyperparameters
232
+
233
+ - `eval_strategy`: steps
234
+ - `per_device_train_batch_size`: 32
235
+ - `per_device_eval_batch_size`: 32
236
+ - `num_train_epochs`: 5
237
+ - `fp16`: True
238
+ - `multi_dataset_batch_sampler`: round_robin
239
+
240
+ #### All Hyperparameters
241
+ <details><summary>Click to expand</summary>
242
+
243
+ - `overwrite_output_dir`: False
244
+ - `do_predict`: False
245
+ - `eval_strategy`: steps
246
+ - `prediction_loss_only`: True
247
+ - `per_device_train_batch_size`: 32
248
+ - `per_device_eval_batch_size`: 32
249
+ - `per_gpu_train_batch_size`: None
250
+ - `per_gpu_eval_batch_size`: None
251
+ - `gradient_accumulation_steps`: 1
252
+ - `eval_accumulation_steps`: None
253
+ - `torch_empty_cache_steps`: None
254
+ - `learning_rate`: 5e-05
255
+ - `weight_decay`: 0.0
256
+ - `adam_beta1`: 0.9
257
+ - `adam_beta2`: 0.999
258
+ - `adam_epsilon`: 1e-08
259
+ - `max_grad_norm`: 1
260
+ - `num_train_epochs`: 5
261
+ - `max_steps`: -1
262
+ - `lr_scheduler_type`: linear
263
+ - `lr_scheduler_kwargs`: {}
264
+ - `warmup_ratio`: 0.0
265
+ - `warmup_steps`: 0
266
+ - `log_level`: passive
267
+ - `log_level_replica`: warning
268
+ - `log_on_each_node`: True
269
+ - `logging_nan_inf_filter`: True
270
+ - `save_safetensors`: True
271
+ - `save_on_each_node`: False
272
+ - `save_only_model`: False
273
+ - `restore_callback_states_from_checkpoint`: False
274
+ - `no_cuda`: False
275
+ - `use_cpu`: False
276
+ - `use_mps_device`: False
277
+ - `seed`: 42
278
+ - `data_seed`: None
279
+ - `jit_mode_eval`: False
280
+ - `use_ipex`: False
281
+ - `bf16`: False
282
+ - `fp16`: True
283
+ - `fp16_opt_level`: O1
284
+ - `half_precision_backend`: auto
285
+ - `bf16_full_eval`: False
286
+ - `fp16_full_eval`: False
287
+ - `tf32`: None
288
+ - `local_rank`: 0
289
+ - `ddp_backend`: None
290
+ - `tpu_num_cores`: None
291
+ - `tpu_metrics_debug`: False
292
+ - `debug`: []
293
+ - `dataloader_drop_last`: False
294
+ - `dataloader_num_workers`: 0
295
+ - `dataloader_prefetch_factor`: None
296
+ - `past_index`: -1
297
+ - `disable_tqdm`: False
298
+ - `remove_unused_columns`: True
299
+ - `label_names`: None
300
+ - `load_best_model_at_end`: False
301
+ - `ignore_data_skip`: False
302
+ - `fsdp`: []
303
+ - `fsdp_min_num_params`: 0
304
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
305
+ - `fsdp_transformer_layer_cls_to_wrap`: None
306
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
307
+ - `deepspeed`: None
308
+ - `label_smoothing_factor`: 0.0
309
+ - `optim`: adamw_torch
310
+ - `optim_args`: None
311
+ - `adafactor`: False
312
+ - `group_by_length`: False
313
+ - `length_column_name`: length
314
+ - `ddp_find_unused_parameters`: None
315
+ - `ddp_bucket_cap_mb`: None
316
+ - `ddp_broadcast_buffers`: False
317
+ - `dataloader_pin_memory`: True
318
+ - `dataloader_persistent_workers`: False
319
+ - `skip_memory_metrics`: True
320
+ - `use_legacy_prediction_loop`: False
321
+ - `push_to_hub`: False
322
+ - `resume_from_checkpoint`: None
323
+ - `hub_model_id`: None
324
+ - `hub_strategy`: every_save
325
+ - `hub_private_repo`: None
326
+ - `hub_always_push`: False
327
+ - `gradient_checkpointing`: False
328
+ - `gradient_checkpointing_kwargs`: None
329
+ - `include_inputs_for_metrics`: False
330
+ - `include_for_metrics`: []
331
+ - `eval_do_concat_batches`: True
332
+ - `fp16_backend`: auto
333
+ - `push_to_hub_model_id`: None
334
+ - `push_to_hub_organization`: None
335
+ - `mp_parameters`:
336
+ - `auto_find_batch_size`: False
337
+ - `full_determinism`: False
338
+ - `torchdynamo`: None
339
+ - `ray_scope`: last
340
+ - `ddp_timeout`: 1800
341
+ - `torch_compile`: False
342
+ - `torch_compile_backend`: None
343
+ - `torch_compile_mode`: None
344
+ - `dispatch_batches`: None
345
+ - `split_batches`: None
346
+ - `include_tokens_per_second`: False
347
+ - `include_num_input_tokens_seen`: False
348
+ - `neftune_noise_alpha`: None
349
+ - `optim_target_modules`: None
350
+ - `batch_eval_metrics`: False
351
+ - `eval_on_start`: False
352
+ - `use_liger_kernel`: False
353
+ - `eval_use_gather_object`: False
354
+ - `average_tokens_across_devices`: False
355
+ - `prompts`: None
356
+ - `batch_sampler`: batch_sampler
357
+ - `multi_dataset_batch_sampler`: round_robin
358
+
359
+ </details>
360
+
361
+ ### Training Logs
362
+ <details><summary>Click to expand</summary>
363
+
364
+ | Epoch | Step | Training Loss |
365
+ |:------:|:-----:|:-------------:|
366
+ | 0.0321 | 500 | 0.3086 |
367
+ | 0.0641 | 1000 | 0.2339 |
368
+ | 0.0962 | 1500 | 0.2289 |
369
+ | 0.1283 | 2000 | 0.2262 |
370
+ | 0.1603 | 2500 | 0.2213 |
371
+ | 0.1924 | 3000 | 0.2158 |
372
+ | 0.2245 | 3500 | 0.2101 |
373
+ | 0.2565 | 4000 | 0.2082 |
374
+ | 0.2886 | 4500 | 0.2107 |
375
+ | 0.3207 | 5000 | 0.2015 |
376
+ | 0.3527 | 5500 | 0.2023 |
377
+ | 0.3848 | 6000 | 0.201 |
378
+ | 0.4169 | 6500 | 0.1974 |
379
+ | 0.4489 | 7000 | 0.191 |
380
+ | 0.4810 | 7500 | 0.1956 |
381
+ | 0.5131 | 8000 | 0.2 |
382
+ | 0.5451 | 8500 | 0.191 |
383
+ | 0.5772 | 9000 | 0.1888 |
384
+ | 0.6092 | 9500 | 0.1885 |
385
+ | 0.6413 | 10000 | 0.1936 |
386
+ | 0.6734 | 10500 | 0.1944 |
387
+ | 0.7054 | 11000 | 0.1806 |
388
+ | 0.7375 | 11500 | 0.1834 |
389
+ | 0.7696 | 12000 | 0.1853 |
390
+ | 0.8016 | 12500 | 0.1823 |
391
+ | 0.8337 | 13000 | 0.1827 |
392
+ | 0.8658 | 13500 | 0.1821 |
393
+ | 0.8978 | 14000 | 0.1724 |
394
+ | 0.9299 | 14500 | 0.1745 |
395
+ | 0.9620 | 15000 | 0.1776 |
396
+ | 0.9940 | 15500 | 0.1781 |
397
+ | 1.0 | 15593 | - |
398
+ | 1.0261 | 16000 | 0.1133 |
399
+ | 1.0582 | 16500 | 0.0964 |
400
+ | 1.0902 | 17000 | 0.0931 |
401
+ | 1.1223 | 17500 | 0.0947 |
402
+ | 1.1544 | 18000 | 0.097 |
403
+ | 1.1864 | 18500 | 0.0977 |
404
+ | 1.2185 | 19000 | 0.096 |
405
+ | 1.2506 | 19500 | 0.1005 |
406
+ | 1.2826 | 20000 | 0.1008 |
407
+ | 1.3147 | 20500 | 0.0998 |
408
+ | 1.3468 | 21000 | 0.0972 |
409
+ | 1.3788 | 21500 | 0.0992 |
410
+ | 1.4109 | 22000 | 0.0994 |
411
+ | 1.4430 | 22500 | 0.1029 |
412
+ | 1.4750 | 23000 | 0.1008 |
413
+ | 1.5071 | 23500 | 0.0985 |
414
+ | 1.5392 | 24000 | 0.1013 |
415
+ | 1.5712 | 24500 | 0.1027 |
416
+ | 1.6033 | 25000 | 0.0988 |
417
+ | 1.6353 | 25500 | 0.0982 |
418
+ | 1.6674 | 26000 | 0.0994 |
419
+ | 1.6995 | 26500 | 0.0998 |
420
+ | 1.7315 | 27000 | 0.0989 |
421
+ | 1.7636 | 27500 | 0.101 |
422
+ | 1.7957 | 28000 | 0.099 |
423
+ | 1.8277 | 28500 | 0.096 |
424
+ | 1.8598 | 29000 | 0.0989 |
425
+ | 1.8919 | 29500 | 0.1011 |
426
+ | 1.9239 | 30000 | 0.0974 |
427
+ | 1.9560 | 30500 | 0.0999 |
428
+ | 1.9881 | 31000 | 0.0976 |
429
+ | 2.0 | 31186 | - |
430
+ | 2.0201 | 31500 | 0.0681 |
431
+ | 2.0522 | 32000 | 0.0478 |
432
+ | 2.0843 | 32500 | 0.0483 |
433
+ | 2.1163 | 33000 | 0.0485 |
434
+ | 2.1484 | 33500 | 0.0472 |
435
+ | 2.1805 | 34000 | 0.0482 |
436
+ | 2.2125 | 34500 | 0.0491 |
437
+ | 2.2446 | 35000 | 0.0484 |
438
+ | 2.2767 | 35500 | 0.0493 |
439
+ | 2.3087 | 36000 | 0.0484 |
440
+ | 2.3408 | 36500 | 0.0503 |
441
+ | 2.3729 | 37000 | 0.0498 |
442
+ | 2.4049 | 37500 | 0.0507 |
443
+ | 2.4370 | 38000 | 0.0502 |
444
+ | 2.4691 | 38500 | 0.0508 |
445
+ | 2.5011 | 39000 | 0.0483 |
446
+ | 2.5332 | 39500 | 0.0486 |
447
+ | 2.5653 | 40000 | 0.0494 |
448
+ | 2.5973 | 40500 | 0.0511 |
449
+ | 2.6294 | 41000 | 0.0508 |
450
+ | 2.6615 | 41500 | 0.0496 |
451
+ | 2.6935 | 42000 | 0.0487 |
452
+ | 2.7256 | 42500 | 0.0497 |
453
+ | 2.7576 | 43000 | 0.0491 |
454
+ | 2.7897 | 43500 | 0.0486 |
455
+ | 2.8218 | 44000 | 0.0503 |
456
+ | 2.8538 | 44500 | 0.0504 |
457
+ | 2.8859 | 45000 | 0.0499 |
458
+ | 2.9180 | 45500 | 0.048 |
459
+ | 2.9500 | 46000 | 0.047 |
460
+ | 2.9821 | 46500 | 0.0497 |
461
+ | 3.0 | 46779 | - |
462
+ | 3.0142 | 47000 | 0.0395 |
463
+ | 3.0462 | 47500 | 0.0247 |
464
+ | 3.0783 | 48000 | 0.0256 |
465
+ | 3.1104 | 48500 | 0.0254 |
466
+ | 3.1424 | 49000 | 0.0247 |
467
+ | 3.1745 | 49500 | 0.0251 |
468
+ | 3.2066 | 50000 | 0.0253 |
469
+ | 3.2386 | 50500 | 0.0263 |
470
+ | 3.2707 | 51000 | 0.0261 |
471
+ | 3.3028 | 51500 | 0.0259 |
472
+ | 3.3348 | 52000 | 0.0256 |
473
+ | 3.3669 | 52500 | 0.0254 |
474
+ | 3.3990 | 53000 | 0.026 |
475
+ | 3.4310 | 53500 | 0.0255 |
476
+ | 3.4631 | 54000 | 0.0255 |
477
+ | 3.4952 | 54500 | 0.0257 |
478
+ | 3.5272 | 55000 | 0.0249 |
479
+ | 3.5593 | 55500 | 0.0251 |
480
+ | 3.5914 | 56000 | 0.026 |
481
+ | 3.6234 | 56500 | 0.0246 |
482
+ | 3.6555 | 57000 | 0.0258 |
483
+ | 3.6876 | 57500 | 0.0266 |
484
+ | 3.7196 | 58000 | 0.0242 |
485
+ | 3.7517 | 58500 | 0.0251 |
486
+ | 3.7837 | 59000 | 0.0243 |
487
+ | 3.8158 | 59500 | 0.0249 |
488
+ | 3.8479 | 60000 | 0.0252 |
489
+ | 3.8799 | 60500 | 0.0251 |
490
+ | 3.9120 | 61000 | 0.025 |
491
+ | 3.9441 | 61500 | 0.0249 |
492
+ | 3.9761 | 62000 | 0.0254 |
493
+ | 4.0 | 62372 | - |
494
+ | 4.0082 | 62500 | 0.0221 |
495
+ | 4.0403 | 63000 | 0.0146 |
496
+ | 4.0723 | 63500 | 0.0146 |
497
+ | 4.1044 | 64000 | 0.0152 |
498
+ | 4.1365 | 64500 | 0.0153 |
499
+ | 4.1685 | 65000 | 0.0144 |
500
+ | 4.2006 | 65500 | 0.0154 |
501
+ | 4.2327 | 66000 | 0.0137 |
502
+ | 4.2647 | 66500 | 0.0145 |
503
+ | 4.2968 | 67000 | 0.0148 |
504
+ | 4.3289 | 67500 | 0.0148 |
505
+ | 4.3609 | 68000 | 0.0142 |
506
+ | 4.3930 | 68500 | 0.0148 |
507
+ | 4.4251 | 69000 | 0.0155 |
508
+ | 4.4571 | 69500 | 0.0148 |
509
+ | 4.4892 | 70000 | 0.0144 |
510
+ | 4.5213 | 70500 | 0.0144 |
511
+ | 4.5533 | 71000 | 0.0148 |
512
+ | 4.5854 | 71500 | 0.015 |
513
+ | 4.6175 | 72000 | 0.0149 |
514
+ | 4.6495 | 72500 | 0.0135 |
515
+ | 4.6816 | 73000 | 0.0142 |
516
+ | 4.7137 | 73500 | 0.0152 |
517
+ | 4.7457 | 74000 | 0.0144 |
518
+ | 4.7778 | 74500 | 0.0143 |
519
+ | 4.8099 | 75000 | 0.0141 |
520
+ | 4.8419 | 75500 | 0.0146 |
521
+ | 4.8740 | 76000 | 0.0142 |
522
+ | 4.9060 | 76500 | 0.0142 |
523
+ | 4.9381 | 77000 | 0.0147 |
524
+ | 4.9702 | 77500 | 0.0145 |
525
+ | 5.0 | 77965 | - |
526
+
527
+ </details>
528
+
529
+ ### Framework Versions
530
+ - Python: 3.11.11
531
+ - Sentence Transformers: 3.4.1
532
+ - Transformers: 4.48.3
533
+ - PyTorch: 2.5.1+cu124
534
+ - Accelerate: 1.3.0
535
+ - Datasets: 3.3.2
536
+ - Tokenizers: 0.21.0
537
+
538
+ ## Citation
539
+
540
+ ### BibTeX
541
+
542
+ #### Sentence Transformers
543
+ ```bibtex
544
+ @inproceedings{reimers-2019-sentence-bert,
545
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
546
+ author = "Reimers, Nils and Gurevych, Iryna",
547
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
548
+ month = "11",
549
+ year = "2019",
550
+ publisher = "Association for Computational Linguistics",
551
+ url = "https://arxiv.org/abs/1908.10084",
552
+ }
553
+ ```
554
+
555
+ #### MultipleNegativesRankingLoss
556
+ ```bibtex
557
+ @misc{henderson2017efficient,
558
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
559
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
560
+ year={2017},
561
+ eprint={1705.00652},
562
+ archivePrefix={arXiv},
563
+ primaryClass={cs.CL}
564
+ }
565
+ ```
566
+
567
+ <!--
568
+ ## Glossary
569
+
570
+ *Clearly define terms in order to be accessible across audiences.*
571
+ -->
572
+
573
+ <!--
574
+ ## Model Card Authors
575
+
576
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
577
+ -->
578
+
579
+ <!--
580
+ ## Model Card Contact
581
+
582
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
583
+ -->
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-m3",
3
+ "architectures": [
4
+ "XLMRobertaModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 1024,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 4096,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 8194,
17
+ "model_type": "xlm-roberta",
18
+ "num_attention_heads": 16,
19
+ "num_hidden_layers": 24,
20
+ "output_past": true,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "torch_dtype": "float32",
24
+ "transformers_version": "4.48.3",
25
+ "type_vocab_size": 1,
26
+ "use_cache": true,
27
+ "vocab_size": 250002
28
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.48.3",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b8a6dc809bf17b82c4c6766c09fb317f585d2ef87f77ed6b4ee6f5c6067764d
3
+ size 2271064456
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
sentencepiece.bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cfc8146abe2a0488e9e2a0c56de7952f7c11ab059eca145a0a727afce0db2865
3
+ size 5069051
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4f7e21bec3fb0044ca0bb2d50eb5d4d8c596273c422baef84466d2c73748b9c
3
+ size 17083053
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "model_max_length": 8192,
51
+ "pad_token": "<pad>",
52
+ "sep_token": "</s>",
53
+ "sp_model_kwargs": {},
54
+ "tokenizer_class": "XLMRobertaTokenizer",
55
+ "unk_token": "<unk>"
56
+ }