bakrianoo commited on
Commit
35ede35
·
verified ·
1 Parent(s): 06bd202

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +180 -223
README.md CHANGED
@@ -18,34 +18,7 @@ tags:
18
  - sentence-similarity
19
  - feature-extraction
20
  - generated_from_trainer
21
- - dataset_size:34436
22
  - loss:CosineSimilarityLoss
23
- widget:
24
- - source_sentence: Three men are playing chess.
25
- sentences:
26
- - Two men are fighting.
27
- - امرأة تحمل و تحمل طفل كنغر
28
- - Two men are playing chess.
29
- - source_sentence: Two men are playing chess.
30
- sentences:
31
- - رجل يعزف على الغيتار و يغني
32
- - Three men are playing chess.
33
- - طائرة طيران تقلع
34
- - source_sentence: Two men are playing chess.
35
- sentences:
36
- - A man is playing a large flute. رجل يعزف على ناي كبير
37
- - The man is playing the piano. الرجل يعزف على البيانو
38
- - Three men are playing chess.
39
- - source_sentence: الرجل يعزف على البيانو The man is playing the piano.
40
- sentences:
41
- - رجل يجلس ويلعب الكمان A man seated is playing the cello.
42
- - ثلاثة رجال يلعبون الشطرنج.
43
- - الرجل يعزف على الغيتار The man is playing the guitar.
44
- - source_sentence: الرجل ضرب الرجل الآخر بعصا The man hit the other man with a stick.
45
- sentences:
46
- - الرجل صفع الرجل الآخر بعصا The man spanked the other man with a stick.
47
- - A plane is taking off.
48
- - A man is smoking. رجل يدخن
49
  model-index:
50
  - name: SentenceTransformer based on silma-ai/silma-embeddding-matryoshka-0.1
51
  results:
@@ -123,6 +96,10 @@ model-index:
123
  - type: spearman_max
124
  value: 0.8530609768738506
125
  name: Spearman Max
 
 
 
 
126
  ---
127
 
128
  # SentenceTransformer based on silma-ai/silma-embeddding-matryoshka-0.1
@@ -133,28 +110,10 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [s
133
 
134
  ### Model Description
135
  - **Model Type:** Sentence Transformer
136
- - **Base model:** [silma-ai/silma-embeddding-matryoshka-0.1](https://huggingface.co/silma-ai/silma-embeddding-matryoshka-0.1) <!-- at revision 9eb50734f432656a01e1f88d28fa9a6fe8b9e148 -->
137
  - **Maximum Sequence Length:** 512 tokens
138
  - **Output Dimensionality:** 768 tokens
139
  - **Similarity Function:** Cosine Similarity
140
- <!-- - **Training Dataset:** Unknown -->
141
- <!-- - **Language:** Unknown -->
142
- <!-- - **License:** Unknown -->
143
-
144
- ### Model Sources
145
-
146
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
147
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
148
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
149
-
150
- ### Full Model Architecture
151
-
152
- ```
153
- SentenceTransformer(
154
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
155
- (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
156
- )
157
- ```
158
 
159
  ## Usage
160
 
@@ -166,26 +125,160 @@ First install the Sentence Transformers library:
166
  pip install -U sentence-transformers
167
  ```
168
 
169
- Then you can load this model and run inference.
 
170
  ```python
171
  from sentence_transformers import SentenceTransformer
 
172
 
173
- # Download from the 🤗 Hub
174
  model = SentenceTransformer("silma-ai/silma-embeddding-sts-0.1")
175
- # Run inference
176
- sentences = [
177
- 'الرجل ضرب الرجل الآخر بعصا The man hit the other man with a stick.',
178
- 'الرجل صفع الرجل الآخر بعصا The man spanked the other man with a stick.',
179
- 'A man is smoking. رجل يدخن',
180
- ]
181
- embeddings = model.encode(sentences)
182
- print(embeddings.shape)
183
- # [3, 768]
184
-
185
- # Get the similarity scores for the embeddings
186
- similarities = model.similarity(embeddings, embeddings)
187
- print(similarities.shape)
188
- # [3, 3]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
189
  ```
190
 
191
  <!--
@@ -264,188 +357,52 @@ You can finetune this model on your own dataset.
264
 
265
  ## Training Details
266
 
267
- ### Training Dataset
268
-
269
- #### Unnamed Dataset
270
-
271
-
272
- * Size: 34,436 training samples
273
- * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
274
- * Approximate statistics based on the first 1000 samples:
275
- | | sentence1 | sentence2 | score |
276
- |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
277
- | type | string | string | float |
278
- | details | <ul><li>min: 4 tokens</li><li>mean: 15.18 tokens</li><li>max: 42 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 15.18 tokens</li><li>max: 42 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.54</li><li>max: 1.0</li></ul> |
279
- * Samples:
280
- | sentence1 | sentence2 | score |
281
- |:---------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|:------------------|
282
- | <code>A woman picks up and holds a baby kangaroo in her arms. امرأة تحمل في ذراعها طفل كنغر</code> | <code>A woman picks up and holds a baby kangaroo. امرأة تحمل و تحمل طفل كنغر</code> | <code>0.92</code> |
283
- | <code>امرأة تحمل و تحمل طفل كنغر A woman picks up and holds a baby kangaroo.</code> | <code>امرأة تحمل في ذراعها طفل كنغر A woman picks up and holds a baby kangaroo in her arms.</code> | <code>0.92</code> |
284
- | <code>رجل يعزف على الناي</code> | <code>رجل يعزف على فرقة الخيزران</code> | <code>0.77</code> |
285
- * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
286
- ```json
287
- {
288
- "loss_fct": "torch.nn.modules.loss.MSELoss"
289
- }
290
- ```
291
-
292
- ### Evaluation Dataset
293
-
294
- #### Unnamed Dataset
295
-
296
-
297
- * Size: 100 evaluation samples
298
- * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
299
- * Approximate statistics based on the first 100 samples:
300
- | | sentence1 | sentence2 | score |
301
- |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
302
- | type | string | string | float |
303
- | details | <ul><li>min: 4 tokens</li><li>mean: 15.96 tokens</li><li>max: 43 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 15.96 tokens</li><li>max: 43 tokens</li></ul> | <ul><li>min: 0.1</li><li>mean: 0.72</li><li>max: 1.0</li></ul> |
304
- * Samples:
305
- | sentence1 | sentence2 | score |
306
- |:------------------------------------|:-----------------------------------------|:-----------------|
307
- | <code>طائرة ستقلع</code> | <code>طائرة طيران تقلع</code> | <code>1.0</code> |
308
- | <code>طائرة طيران تقلع</code> | <code>طائرة ستقلع</code> | <code>1.0</code> |
309
- | <code>A plane is taking off.</code> | <code>An air plane is taking off.</code> | <code>1.0</code> |
310
- * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
311
- ```json
312
- {
313
- "loss_fct": "torch.nn.modules.loss.MSELoss"
314
- }
315
- ```
316
-
317
- ### Training Hyperparameters
318
- #### Non-Default Hyperparameters
319
 
320
- - `eval_strategy`: steps
321
  - `per_device_train_batch_size`: 250
322
  - `per_device_eval_batch_size`: 10
323
- - `learning_rate`: 1e-06
324
- - `num_train_epochs`: 10
325
  - `bf16`: True
326
  - `dataloader_drop_last`: True
327
  - `optim`: adamw_torch_fused
328
  - `batch_sampler`: no_duplicates
329
 
330
- #### All Hyperparameters
331
- <details><summary>Click to expand</summary>
 
 
 
 
 
 
 
 
332
 
333
- - `overwrite_output_dir`: False
334
- - `do_predict`: False
335
  - `eval_strategy`: steps
336
- - `prediction_loss_only`: True
337
  - `per_device_train_batch_size`: 250
338
  - `per_device_eval_batch_size`: 10
339
- - `per_gpu_train_batch_size`: None
340
- - `per_gpu_eval_batch_size`: None
341
- - `gradient_accumulation_steps`: 1
342
- - `eval_accumulation_steps`: None
343
- - `torch_empty_cache_steps`: None
344
  - `learning_rate`: 1e-06
345
- - `weight_decay`: 0.0
346
- - `adam_beta1`: 0.9
347
- - `adam_beta2`: 0.999
348
- - `adam_epsilon`: 1e-08
349
- - `max_grad_norm`: 1.0
350
  - `num_train_epochs`: 10
351
- - `max_steps`: -1
352
- - `lr_scheduler_type`: linear
353
- - `lr_scheduler_kwargs`: {}
354
- - `warmup_ratio`: 0.0
355
- - `warmup_steps`: 0
356
- - `log_level`: passive
357
- - `log_level_replica`: warning
358
- - `log_on_each_node`: True
359
- - `logging_nan_inf_filter`: True
360
- - `save_safetensors`: True
361
- - `save_on_each_node`: False
362
- - `save_only_model`: False
363
- - `restore_callback_states_from_checkpoint`: False
364
- - `no_cuda`: False
365
- - `use_cpu`: False
366
- - `use_mps_device`: False
367
- - `seed`: 42
368
- - `data_seed`: None
369
- - `jit_mode_eval`: False
370
- - `use_ipex`: False
371
  - `bf16`: True
372
- - `fp16`: False
373
- - `fp16_opt_level`: O1
374
- - `half_precision_backend`: auto
375
- - `bf16_full_eval`: False
376
- - `fp16_full_eval`: False
377
- - `tf32`: None
378
- - `local_rank`: 0
379
- - `ddp_backend`: None
380
- - `tpu_num_cores`: None
381
- - `tpu_metrics_debug`: False
382
- - `debug`: []
383
  - `dataloader_drop_last`: True
384
- - `dataloader_num_workers`: 0
385
- - `dataloader_prefetch_factor`: None
386
- - `past_index`: -1
387
- - `disable_tqdm`: False
388
- - `remove_unused_columns`: True
389
- - `label_names`: None
390
- - `load_best_model_at_end`: False
391
- - `ignore_data_skip`: False
392
- - `fsdp`: []
393
- - `fsdp_min_num_params`: 0
394
- - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
395
- - `fsdp_transformer_layer_cls_to_wrap`: None
396
- - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
397
- - `deepspeed`: None
398
- - `label_smoothing_factor`: 0.0
399
  - `optim`: adamw_torch_fused
400
- - `optim_args`: None
401
- - `adafactor`: False
402
- - `group_by_length`: False
403
- - `length_column_name`: length
404
- - `ddp_find_unused_parameters`: None
405
- - `ddp_bucket_cap_mb`: None
406
- - `ddp_broadcast_buffers`: False
407
- - `dataloader_pin_memory`: True
408
- - `dataloader_persistent_workers`: False
409
- - `skip_memory_metrics`: True
410
- - `use_legacy_prediction_loop`: False
411
- - `push_to_hub`: False
412
- - `resume_from_checkpoint`: None
413
- - `hub_model_id`: None
414
- - `hub_strategy`: every_save
415
- - `hub_private_repo`: False
416
- - `hub_always_push`: False
417
- - `gradient_checkpointing`: False
418
- - `gradient_checkpointing_kwargs`: None
419
- - `include_inputs_for_metrics`: False
420
- - `eval_do_concat_batches`: True
421
- - `fp16_backend`: auto
422
- - `push_to_hub_model_id`: None
423
- - `push_to_hub_organization`: None
424
- - `mp_parameters`:
425
- - `auto_find_batch_size`: False
426
- - `full_determinism`: False
427
- - `torchdynamo`: None
428
- - `ray_scope`: last
429
- - `ddp_timeout`: 1800
430
- - `torch_compile`: False
431
- - `torch_compile_backend`: None
432
- - `torch_compile_mode`: None
433
- - `dispatch_batches`: None
434
- - `split_batches`: None
435
- - `include_tokens_per_second`: False
436
- - `include_num_input_tokens_seen`: False
437
- - `neftune_noise_alpha`: None
438
- - `optim_target_modules`: None
439
- - `batch_eval_metrics`: False
440
- - `eval_on_start`: False
441
- - `use_liger_kernel`: False
442
- - `eval_use_gather_object`: False
443
  - `batch_sampler`: no_duplicates
444
- - `multi_dataset_batch_sampler`: proportional
 
 
445
 
446
  </details>
447
 
448
- ### Training Logs
449
  | Epoch | Step | Training Loss | Validation Loss | sts-dev-512_spearman_cosine | sts-dev-256_spearman_cosine |
450
  |:------:|:----:|:-------------:|:---------------:|:---------------------------:|:---------------------------:|
451
  | 0.3650 | 50 | 0.0395 | 0.0424 | 0.8486 | 0.8487 |
 
18
  - sentence-similarity
19
  - feature-extraction
20
  - generated_from_trainer
 
21
  - loss:CosineSimilarityLoss
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  model-index:
23
  - name: SentenceTransformer based on silma-ai/silma-embeddding-matryoshka-0.1
24
  results:
 
96
  - type: spearman_max
97
  value: 0.8530609768738506
98
  name: Spearman Max
99
+ license: apache-2.0
100
+ language:
101
+ - ar
102
+ - en
103
  ---
104
 
105
  # SentenceTransformer based on silma-ai/silma-embeddding-matryoshka-0.1
 
110
 
111
  ### Model Description
112
  - **Model Type:** Sentence Transformer
113
+ - **Base model:** [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02)
114
  - **Maximum Sequence Length:** 512 tokens
115
  - **Output Dimensionality:** 768 tokens
116
  - **Similarity Function:** Cosine Similarity
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
 
118
  ## Usage
119
 
 
125
  pip install -U sentence-transformers
126
  ```
127
 
128
+ Then load the model
129
+
130
  ```python
131
  from sentence_transformers import SentenceTransformer
132
+ from sentence_transformers.util import cos_sim
133
 
 
134
  model = SentenceTransformer("silma-ai/silma-embeddding-sts-0.1")
135
+ ```
136
+
137
+ ### Samples
138
+
139
+ #### [+] Short Sentence Similarity
140
+
141
+ **Arabic**
142
+ ```python
143
+ query = "الطقس اليوم مشمس"
144
+ sentence_1 = "الجو اليوم كان مشمسًا ورائعًا"
145
+ sentence_2 = "الطقس اليوم غائم"
146
+
147
+ query_embedding = model.encode(query)
148
+
149
+ print("sentence_1_similarity:", cos_sim(query_embedding, model.encode(sentence_1))[0][0].tolist())
150
+ print("sentence_2_similarity:", cos_sim(query_embedding, model.encode(sentence_2))[0][0].tolist())
151
+
152
+ # ======= Output
153
+ # sentence_1_similarity: 0.42602288722991943
154
+ # sentence_2_similarity: 0.10798501968383789
155
+ # =======
156
+ ```
157
+
158
+ **English**
159
+ ```python
160
+ query = "The weather is sunny today"
161
+ sentence_1 = "The morning was bright and sunny"
162
+ sentence_2 = "it is too cloudy today"
163
+
164
+ query_embedding = model.encode(query)
165
+
166
+ print("sentence_1_similarity:", cos_sim(query_embedding, model.encode(sentence_1))[0][0].tolist())
167
+ print("sentence_2_similarity:", cos_sim(query_embedding, model.encode(sentence_2))[0][0].tolist())
168
+
169
+ # ======= Output
170
+ # sentence_1_similarity: 0.5796191692352295
171
+ # sentence_2_similarity: 0.21948376297950745
172
+ # =======
173
+ ```
174
+
175
+ #### [+] Long Sentence Similarity
176
+
177
+ **Arabic**
178
+ ```python
179
+ query = "الكتاب يتحدث عن أهمية الذكاء الاصطناعي في تطوير المجتمعات الحديثة"
180
+ sentence_1 = "في هذا الكتاب، يناقش الكاتب كيف يمكن للتكنولوجيا أن تغير العالم"
181
+ sentence_2 = "الكاتب يتحدث عن أساليب الطبخ التقليدية في دول البحر الأبيض المتوسط"
182
+
183
+ query_embedding = model.encode(query)
184
+
185
+ print("sentence_1_similarity:", cos_sim(query_embedding, model.encode(sentence_1))[0][0].tolist())
186
+ print("sentence_2_similarity:", cos_sim(query_embedding, model.encode(sentence_2))[0][0].tolist())
187
+
188
+ # ======= Output
189
+ # sentence_1_similarity: 0.5725120306015015
190
+ # sentence_2_similarity: 0.22617210447788239
191
+ # =======
192
+ ```
193
+
194
+ **English**
195
+ ```python
196
+ query = "China said on Saturday it would issue special bonds to help its sputtering economy, signalling a spending spree to bolster banks"
197
+ sentence_1 = "The Chinese government announced plans to release special bonds aimed at supporting its struggling economy and stabilizing the banking sector."
198
+ sentence_2 = "Several countries are preparing for a global technology summit to discuss advancements in bolster global banks."
199
+
200
+ query_embedding = model.encode(query)
201
+
202
+ print("sentence_1_similarity:", cos_sim(query_embedding, model.encode(sentence_1))[0][0].tolist())
203
+ print("sentence_2_similarity:", cos_sim(query_embedding, model.encode(sentence_2))[0][0].tolist())
204
+
205
+ # ======= Output
206
+ # sentence_1_similarity: 0.6438770294189453
207
+ # sentence_2_similarity: 0.4720292389392853
208
+ # =======
209
+ ```
210
+
211
+ #### [+] Question to Paragraph Matching
212
+
213
+ **Arabic**
214
+ ```python
215
+ query = "ما هي فوائد ممارسة الرياضة؟"
216
+ sentence_1 = "ممارسة الرياضة بشكل منتظم تساعد على تحسين الصحة العامة واللياقة البدنية"
217
+ sentence_2 = "تعليم الأطفال في سن مبكرة يساعدهم على تطوير المهارات العقلية بسرعة"
218
+
219
+ query_embedding = model.encode(query)
220
+
221
+ print("sentence_1_similarity:", cos_sim(query_embedding, model.encode(sentence_1))[0][0].tolist())
222
+ print("sentence_2_similarity:", cos_sim(query_embedding, model.encode(sentence_2))[0][0].tolist())
223
+
224
+ # ======= Output
225
+ # sentence_1_similarity: 0.6058318614959717
226
+ # sentence_2_similarity: 0.006831036880612373
227
+ # =======
228
+ ```
229
+
230
+ **English**
231
+ ```python
232
+ query = "What are the benefits of exercising?"
233
+ sentence_1 = "Regular exercise helps improve overall health and physical fitness"
234
+ sentence_2 = "Teaching children at an early age helps them develop cognitive skills quickly"
235
+
236
+ query_embedding = model.encode(query)
237
+
238
+ print("sentence_1_similarity:", cos_sim(query_embedding, model.encode(sentence_1))[0][0].tolist())
239
+ print("sentence_2_similarity:", cos_sim(query_embedding, model.encode(sentence_2))[0][0].tolist())
240
+
241
+ # ======= Output
242
+ # sentence_1_similarity: 0.3593001365661621
243
+ # sentence_2_similarity: 0.06493218243122101
244
+ # =======
245
+ ```
246
+
247
+ #### [+] Message to Intent-Name Mapping
248
+
249
+ **Arabic**
250
+ ```python
251
+ query = "أرغب في حجز تذكرة طيران من دبي الى القاهرة يوم الثلاثاء القادم"
252
+ sentence_1 = "حجز رحلة"
253
+ sentence_2 = "إلغاء حجز"
254
+
255
+ query_embedding = model.encode(query)
256
+
257
+ print("sentence_1_similarity:", cos_sim(query_embedding, model.encode(sentence_1))[0][0].tolist())
258
+ print("sentence_2_similarity:", cos_sim(query_embedding, model.encode(sentence_2))[0][0].tolist())
259
+
260
+ # ======= Output
261
+ # sentence_1_similarity: 0.4646468162536621
262
+ # sentence_2_similarity: 0.19563665986061096
263
+ # =======
264
+ ```
265
+
266
+ **ُEnglish**
267
+ ```python
268
+ query = "Please send and email to all of the managers"
269
+ sentence_1 = "send email"
270
+ sentence_2 = "read inbox emails"
271
+
272
+ query_embedding = model.encode(query)
273
+
274
+ print("sentence_1_similarity:", cos_sim(query_embedding, model.encode(sentence_1))[0][0].tolist())
275
+ print("sentence_2_similarity:", cos_sim(query_embedding, model.encode(sentence_2))[0][0].tolist())
276
+
277
+ # ======= Output
278
+ # sentence_1_similarity: 0.6096147298812866
279
+ # sentence_2_similarity: 0.42170101404190063
280
+ # =======
281
+
282
  ```
283
 
284
  <!--
 
357
 
358
  ## Training Details
359
 
360
+ This model was finetunned via 2 pahases:
361
+
362
+ ### Phase 1:
363
+
364
+ In phase `1`, we curated a dataset [silma-ai/silma-arabic-triplets-dataset-v1.0](https://huggingface.co/datasets/silma-ai/silma-arabic-triplets-dataset-v1.0) which
365
+ contains more than `2.25M` records of (anchor, positive and negative) Arabic/English samples.
366
+ Only the first `600` samples were taken to be the `eval` dataset, while the rest was used for fine-tuning.
367
+
368
+ Phase `1` produces a finetuned `Matryoshka` model based on [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) with the following hyperparameters:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
369
 
 
370
  - `per_device_train_batch_size`: 250
371
  - `per_device_eval_batch_size`: 10
372
+ - `learning_rate`: 1e-05
373
+ - `num_train_epochs`: 3
374
  - `bf16`: True
375
  - `dataloader_drop_last`: True
376
  - `optim`: adamw_torch_fused
377
  - `batch_sampler`: no_duplicates
378
 
379
+ **[trainin-example](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/matryoshka/matryoshka_sts.py)**
380
+
381
+
382
+ ### Phase 2:
383
+
384
+ In phase `2`, we curated a dataset [silma-ai/silma-arabic-english-sts-dataset-v1.0](https://huggingface.co/datasets/silma-ai/silma-arabic-english-sts-dataset-v1.0) which
385
+ contains more than `30k` records of (sentence1, sentence2 and similarity-score) Arabic/English samples.
386
+ Only the first `100` samples were taken to be the `eval` dataset, while the rest was used for fine-tuning.
387
+
388
+ Phase `1` produces a finetuned `STS` model based on the model from phase `1`, with the following hyperparameters:
389
 
 
 
390
  - `eval_strategy`: steps
 
391
  - `per_device_train_batch_size`: 250
392
  - `per_device_eval_batch_size`: 10
 
 
 
 
 
393
  - `learning_rate`: 1e-06
 
 
 
 
 
394
  - `num_train_epochs`: 10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
395
  - `bf16`: True
 
 
 
 
 
 
 
 
 
 
 
396
  - `dataloader_drop_last`: True
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
397
  - `optim`: adamw_torch_fused
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
398
  - `batch_sampler`: no_duplicates
399
+
400
+ **[trainin-example](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/sts/training_stsbenchmark_continue_training.py)**
401
+
402
 
403
  </details>
404
 
405
+ ### Training Logs (Phase 2)
406
  | Epoch | Step | Training Loss | Validation Loss | sts-dev-512_spearman_cosine | sts-dev-256_spearman_cosine |
407
  |:------:|:----:|:-------------:|:---------------:|:---------------------------:|:---------------------------:|
408
  | 0.3650 | 50 | 0.0395 | 0.0424 | 0.8486 | 0.8487 |