bobox commited on
Commit
2a8f28a
·
verified ·
1 Parent(s): da39aea

10 epoch 32 batch

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,715 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: sentence-transformers
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:689221
11
+ - loss:MultipleNegativesRankingLoss
12
+ - loss:CoSENTLoss
13
+ - loss:GISTEmbedLoss
14
+ - loss:OnlineContrastiveLoss
15
+ - loss:MultipleNegativesSymmetricRankingLoss
16
+ base_model: microsoft/deberta-v3-small
17
+ datasets:
18
+ - sentence-transformers/all-nli
19
+ - sentence-transformers/stsb
20
+ - tals/vitaminc
21
+ - nyu-mll/glue
22
+ - allenai/scitail
23
+ - sentence-transformers/xsum
24
+ - sentence-transformers/sentence-compression
25
+ metrics:
26
+ - pearson_cosine
27
+ - spearman_cosine
28
+ - pearson_manhattan
29
+ - spearman_manhattan
30
+ - pearson_euclidean
31
+ - spearman_euclidean
32
+ - pearson_dot
33
+ - spearman_dot
34
+ - pearson_max
35
+ - spearman_max
36
+ widget:
37
+ - source_sentence: What are the exceptions in the constitution that require special
38
+ considerations to amend?
39
+ sentences:
40
+ - The river makes a distinctive turn to the north near Chur.
41
+ - The Victorian Constitution can be amended by the Parliament of Victoria, except
42
+ for certain "entrenched" provisions that require either an absolute majority in
43
+ both houses, a three-fifths majority in both houses, or the approval of the Victorian
44
+ people in a referendum, depending on the provision.
45
+ - A new arrangement of the theme, once again by Gold, was introduced in the 2007
46
+ Christmas special episode, "Voyage of the Damned"; Gold returned as composer for
47
+ the 2010 series.
48
+ - source_sentence: What is the name of a Bodhisattva vow?
49
+ sentences:
50
+ - In Tibetan Buddhism the teachers of Dharma in Tibet are most commonly called a
51
+ Lama.
52
+ - This origin of chloroplasts was first suggested by the Russian biologist Konstantin
53
+ Mereschkowski in 1905 after Andreas Schimper observed in 1883 that chloroplasts
54
+ closely resemble cyanobacteria.
55
+ - The announcement came a day after Setanta Sports confirmed that it would launch
56
+ in March as a subscription service on the digital terrestrial platform, and on
57
+ the same day that NTL's services re-branded as Virgin Media.
58
+ - source_sentence: Two dogs run around inside a fence.
59
+ sentences:
60
+ - A young woman tennis player have many tennis balls.
61
+ - Two dogs are inside a fence.
62
+ - A little girl in red plays tennis.
63
+ - source_sentence: A little boy wearing a blue stiped shirt has a party hat on his
64
+ head and is playing in a puddle.
65
+ sentences:
66
+ - The party boy is playing in a puddle.
67
+ - There is a crowd
68
+ - Four people are skiing
69
+ - source_sentence: Two wrestlers jump in a ring while an official watches.
70
+ sentences:
71
+ - The man was walking.
72
+ - Two men are dressed in makeup
73
+ - Two wrestlers were just tagged in on a tag team match.
74
+ pipeline_tag: sentence-similarity
75
+ model-index:
76
+ - name: SentenceTransformer based on microsoft/deberta-v3-small
77
+ results:
78
+ - task:
79
+ type: semantic-similarity
80
+ name: Semantic Similarity
81
+ dataset:
82
+ name: sts test
83
+ type: sts-test
84
+ metrics:
85
+ - type: pearson_cosine
86
+ value: 0.7827777535990615
87
+ name: Pearson Cosine
88
+ - type: spearman_cosine
89
+ value: 0.7930096932283699
90
+ name: Spearman Cosine
91
+ - type: pearson_manhattan
92
+ value: 0.7959463678643859
93
+ name: Pearson Manhattan
94
+ - type: spearman_manhattan
95
+ value: 0.792182337344966
96
+ name: Spearman Manhattan
97
+ - type: pearson_euclidean
98
+ value: 0.7948115210006163
99
+ name: Pearson Euclidean
100
+ - type: spearman_euclidean
101
+ value: 0.7907409787879929
102
+ name: Spearman Euclidean
103
+ - type: pearson_dot
104
+ value: 0.7150471304135075
105
+ name: Pearson Dot
106
+ - type: spearman_dot
107
+ value: 0.6966062484321753
108
+ name: Spearman Dot
109
+ - type: pearson_max
110
+ value: 0.7959463678643859
111
+ name: Pearson Max
112
+ - type: spearman_max
113
+ value: 0.7930096932283699
114
+ name: Spearman Max
115
+ ---
116
+
117
+ # SentenceTransformer based on microsoft/deberta-v3-small
118
+
119
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) on the [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli), [sts-label](https://huggingface.co/datasets/sentence-transformers/stsb), [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc), [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue), [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail), [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail), [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum) and [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression) datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
120
+
121
+ ## Model Details
122
+
123
+ ### Model Description
124
+ - **Model Type:** Sentence Transformer
125
+ - **Base model:** [microsoft/deberta-v3-small](https://huggingface.co/microsoft/deberta-v3-small) <!-- at revision a36c739020e01763fe789b4b85e2df55d6180012 -->
126
+ - **Maximum Sequence Length:** 512 tokens
127
+ - **Output Dimensionality:** 768 tokens
128
+ - **Similarity Function:** Cosine Similarity
129
+ - **Training Datasets:**
130
+ - [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli)
131
+ - [sts-label](https://huggingface.co/datasets/sentence-transformers/stsb)
132
+ - [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc)
133
+ - [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue)
134
+ - [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail)
135
+ - [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail)
136
+ - [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum)
137
+ - [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression)
138
+ - **Language:** en
139
+ <!-- - **License:** Unknown -->
140
+
141
+ ### Model Sources
142
+
143
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
144
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
145
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
146
+
147
+ ### Full Model Architecture
148
+
149
+ ```
150
+ SentenceTransformer(
151
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model
152
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
153
+ )
154
+ ```
155
+
156
+ ## Usage
157
+
158
+ ### Direct Usage (Sentence Transformers)
159
+
160
+ First install the Sentence Transformers library:
161
+
162
+ ```bash
163
+ pip install -U sentence-transformers
164
+ ```
165
+
166
+ Then you can load this model and run inference.
167
+ ```python
168
+ from sentence_transformers import SentenceTransformer
169
+
170
+ # Download from the 🤗 Hub
171
+ model = SentenceTransformer("bobox/DeBERTaV3-small-GeneralSentenceTransformer")
172
+ # Run inference
173
+ sentences = [
174
+ 'Two wrestlers jump in a ring while an official watches.',
175
+ 'Two wrestlers were just tagged in on a tag team match.',
176
+ 'Two men are dressed in makeup',
177
+ ]
178
+ embeddings = model.encode(sentences)
179
+ print(embeddings.shape)
180
+ # [3, 768]
181
+
182
+ # Get the similarity scores for the embeddings
183
+ similarities = model.similarity(embeddings, embeddings)
184
+ print(similarities.shape)
185
+ # [3, 3]
186
+ ```
187
+
188
+ <!--
189
+ ### Direct Usage (Transformers)
190
+
191
+ <details><summary>Click to see the direct usage in Transformers</summary>
192
+
193
+ </details>
194
+ -->
195
+
196
+ <!--
197
+ ### Downstream Usage (Sentence Transformers)
198
+
199
+ You can finetune this model on your own dataset.
200
+
201
+ <details><summary>Click to expand</summary>
202
+
203
+ </details>
204
+ -->
205
+
206
+ <!--
207
+ ### Out-of-Scope Use
208
+
209
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
210
+ -->
211
+
212
+ ## Evaluation
213
+
214
+ ### Metrics
215
+
216
+ #### Semantic Similarity
217
+ * Dataset: `sts-test`
218
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
219
+
220
+ | Metric | Value |
221
+ |:--------------------|:----------|
222
+ | pearson_cosine | 0.7828 |
223
+ | **spearman_cosine** | **0.793** |
224
+ | pearson_manhattan | 0.7959 |
225
+ | spearman_manhattan | 0.7922 |
226
+ | pearson_euclidean | 0.7948 |
227
+ | spearman_euclidean | 0.7907 |
228
+ | pearson_dot | 0.715 |
229
+ | spearman_dot | 0.6966 |
230
+ | pearson_max | 0.7959 |
231
+ | spearman_max | 0.793 |
232
+
233
+ <!--
234
+ ## Bias, Risks and Limitations
235
+
236
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
237
+ -->
238
+
239
+ <!--
240
+ ### Recommendations
241
+
242
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
243
+ -->
244
+
245
+ ## Training Details
246
+
247
+ ### Training Datasets
248
+
249
+ #### nli-pairs
250
+
251
+ * Dataset: [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
252
+ * Size: 150,000 training samples
253
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
254
+ * Approximate statistics based on the first 1000 samples:
255
+ | | sentence1 | sentence2 |
256
+ |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
257
+ | type | string | string |
258
+ | details | <ul><li>min: 5 tokens</li><li>mean: 16.62 tokens</li><li>max: 62 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 9.46 tokens</li><li>max: 29 tokens</li></ul> |
259
+ * Samples:
260
+ | sentence1 | sentence2 |
261
+ |:---------------------------------------------------------------------------|:-------------------------------------------------|
262
+ | <code>A person on a horse jumps over a broken down airplane.</code> | <code>A person is outdoors, on a horse.</code> |
263
+ | <code>Children smiling and waving at camera</code> | <code>There are children present</code> |
264
+ | <code>A boy is jumping on skateboard in the middle of a red bridge.</code> | <code>The boy does a skateboarding trick.</code> |
265
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
266
+ ```json
267
+ {
268
+ "scale": 20.0,
269
+ "similarity_fct": "cos_sim"
270
+ }
271
+ ```
272
+
273
+ #### sts-label
274
+
275
+ * Dataset: [sts-label](https://huggingface.co/datasets/sentence-transformers/stsb) at [ab7a5ac](https://huggingface.co/datasets/sentence-transformers/stsb/tree/ab7a5ac0e35aa22088bdcf23e7fd99b220e53308)
276
+ * Size: 5,749 training samples
277
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
278
+ * Approximate statistics based on the first 1000 samples:
279
+ | | sentence1 | sentence2 | score |
280
+ |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------|
281
+ | type | string | string | float |
282
+ | details | <ul><li>min: 6 tokens</li><li>mean: 9.81 tokens</li><li>max: 27 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 9.74 tokens</li><li>max: 25 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.54</li><li>max: 1.0</li></ul> |
283
+ * Samples:
284
+ | sentence1 | sentence2 | score |
285
+ |:-----------------------------------------------------------|:----------------------------------------------------------------------|:------------------|
286
+ | <code>A plane is taking off.</code> | <code>An air plane is taking off.</code> | <code>1.0</code> |
287
+ | <code>A man is playing a large flute.</code> | <code>A man is playing a flute.</code> | <code>0.76</code> |
288
+ | <code>A man is spreading shreded cheese on a pizza.</code> | <code>A man is spreading shredded cheese on an uncooked pizza.</code> | <code>0.76</code> |
289
+ * Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
290
+ ```json
291
+ {
292
+ "scale": 20.0,
293
+ "similarity_fct": "pairwise_cos_sim"
294
+ }
295
+ ```
296
+
297
+ #### vitaminc-pairs
298
+
299
+ * Dataset: [vitaminc-pairs](https://huggingface.co/datasets/tals/vitaminc) at [be6febb](https://huggingface.co/datasets/tals/vitaminc/tree/be6febb761b0b2807687e61e0b5282e459df2fa0)
300
+ * Size: 75,142 training samples
301
+ * Columns: <code>label</code>, <code>sentence1</code>, and <code>sentence2</code>
302
+ * Approximate statistics based on the first 1000 samples:
303
+ | | label | sentence1 | sentence2 |
304
+ |:--------|:-----------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
305
+ | type | int | string | string |
306
+ | details | <ul><li>1: 100.00%</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 17.44 tokens</li><li>max: 53 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 38.0 tokens</li><li>max: 151 tokens</li></ul> |
307
+ * Samples:
308
+ | label | sentence1 | sentence2 |
309
+ |:---------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
310
+ | <code>1</code> | <code>Penguins has a rating of less than 92 % , defined by more than 20 reviews on Rotten Tomatoes .</code> | <code>On review aggregator Rotten Tomatoes , the film holds an approval rating of 91 % based on 22 reviews , with an average rating of 7.14/10 .</code> |
311
+ | <code>1</code> | <code>Fluoxetine , acts as a positive allosteric modulator of the GABAA receptor at high concentrations , as does norfluoxetine though more potently .</code> | <code>In addition , it acts as a positive allosteric modulator of the GABAA receptor at high concentrations , and norfluoxetine does the same but more potently , actions which may be clinically-relevant .</code> |
312
+ | <code>1</code> | <code>Andrew Robertson is considered by many experts to be one of the best left backs .</code> | <code>He is considered by many pundits to be one of the best left backs in the world due to his pace and crossing ability.</code> |
313
+ * Loss: [<code>GISTEmbedLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#gistembedloss) with these parameters:
314
+ ```json
315
+ {'guide': SentenceTransformer(
316
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
317
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
318
+ (2): Normalize()
319
+ ), 'temperature': 0.05}
320
+ ```
321
+
322
+ #### qnli-contrastive
323
+
324
+ * Dataset: [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue) at [bcdcba7](https://huggingface.co/datasets/nyu-mll/glue/tree/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
325
+ * Size: 104,743 training samples
326
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
327
+ * Approximate statistics based on the first 1000 samples:
328
+ | | sentence1 | sentence2 | label |
329
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------|
330
+ | type | string | string | int |
331
+ | details | <ul><li>min: 3 tokens</li><li>mean: 13.82 tokens</li><li>max: 39 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 34.56 tokens</li><li>max: 110 tokens</li></ul> | <ul><li>0: 100.00%</li></ul> |
332
+ * Samples:
333
+ | sentence1 | sentence2 | label |
334
+ |:-----------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
335
+ | <code>Which Formula One racing team developed the C-X75's used for filming.</code> | <code>The C-X75s used for filming were developed by the engineering division of Formula One racing team Williams, who built the original C-X75 prototype for Jaguar.</code> | <code>0</code> |
336
+ | <code>When did the University of Michigan leave Detroit?</code> | <code>In June 2009, the Michigan State University College of Osteopathic Medicine which is based in East Lansing opened a satellite campus located at the Detroit Medical Center.</code> | <code>0</code> |
337
+ | <code>When did the Vlachs migrate into the region?</code> | <code>The Gorals of southern Poland and northern Slovakia are partially descended from Romance-speaking Vlachs who migrated into the region from the 14th to 17th centuries and were absorbed into the local population.</code> | <code>0</code> |
338
+ * Loss: [<code>OnlineContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#onlinecontrastiveloss)
339
+
340
+ #### scitail-pairs-qa
341
+
342
+ * Dataset: [scitail-pairs-qa](https://huggingface.co/datasets/allenai/scitail) at [0cc4353](https://huggingface.co/datasets/allenai/scitail/tree/0cc4353235b289165dfde1c7c5d1be983f99ce44)
343
+ * Size: 14,987 training samples
344
+ * Columns: <code>sentence2</code> and <code>sentence1</code>
345
+ * Approximate statistics based on the first 1000 samples:
346
+ | | sentence2 | sentence1 |
347
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
348
+ | type | string | string |
349
+ | details | <ul><li>min: 7 tokens</li><li>mean: 16.04 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 15.14 tokens</li><li>max: 34 tokens</li></ul> |
350
+ * Samples:
351
+ | sentence2 | sentence1 |
352
+ |:--------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------|
353
+ | <code>Voltage is not the same as energy, but means the energy per unit charge.</code> | <code>What term is not the same as energy, but means the energy per unit charge?</code> |
354
+ | <code>A jellyfish does not have a circulatory system.</code> | <code>Name the type of system that a jellyfish does not have?</code> |
355
+ | <code>Insight learning is based on past experience and reasoning.</code> | <code>What type of learning is based on past experience and reasoning?</code> |
356
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
357
+ ```json
358
+ {
359
+ "scale": 20.0,
360
+ "similarity_fct": "cos_sim"
361
+ }
362
+ ```
363
+
364
+ #### scitail-pairs-pos
365
+
366
+ * Dataset: [scitail-pairs-pos](https://huggingface.co/datasets/allenai/scitail) at [0cc4353](https://huggingface.co/datasets/allenai/scitail/tree/0cc4353235b289165dfde1c7c5d1be983f99ce44)
367
+ * Size: 8,600 training samples
368
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
369
+ * Approximate statistics based on the first 1000 samples:
370
+ | | sentence1 | sentence2 |
371
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
372
+ | type | string | string |
373
+ | details | <ul><li>min: 6 tokens</li><li>mean: 23.99 tokens</li><li>max: 65 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 15.54 tokens</li><li>max: 39 tokens</li></ul> |
374
+ * Samples:
375
+ | sentence1 | sentence2 |
376
+ |:-----------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------|
377
+ | <code>A) A calorie is a unit of measure used to express the amount of energy a food produces in the body.</code> | <code>Another unit of energy, used widely in the health professions and everyday life, is calorie ( cal )?</code> |
378
+ | <code>solid 1 A state that retains shape independent of the shape of the container it occupies.</code> | <code>Solid takes neither the shape nor the volume of its container.</code> |
379
+ | <code>Sometimes the two sides of a fracture moved due to the pressure and a fault was formed.</code> | <code>A fault is the fracture caused when rocks on both sides move.</code> |
380
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
381
+ ```json
382
+ {
383
+ "scale": 20.0,
384
+ "similarity_fct": "cos_sim"
385
+ }
386
+ ```
387
+
388
+ #### xsum-pairs
389
+
390
+ * Dataset: [xsum-pairs](https://huggingface.co/datasets/sentence-transformers/xsum) at [788ddaf](https://huggingface.co/datasets/sentence-transformers/xsum/tree/788ddafe04e539956d56b567bc32a036ee7b9206)
391
+ * Size: 150,000 training samples
392
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
393
+ * Approximate statistics based on the first 1000 samples:
394
+ | | sentence1 | sentence2 |
395
+ |:--------|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
396
+ | type | string | string |
397
+ | details | <ul><li>min: 13 tokens</li><li>mean: 346.32 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 26.95 tokens</li><li>max: 66 tokens</li></ul> |
398
+ * Samples:
399
+ | sentence1 | sentence2 |
400
+ ||:---------------------------------------------------------------------------------------------------------------------------------------------------------------|
401
+ | <code>Jacob Murphy fired in his 10th goal of the season from inside the box to give the Canaries the lead at the break.<br>Adam Hammill, Angus MacDonald and Marley Watkins all went close for the visitors after the restart.<br>Norwich then stretched their lead thanks to MacDonald's own goal to leave them five points behind sixth-placed Sheffield Wednesday.<br>Victory means caretaker boss Alan Irvine has now claimed four points from a possible six since the departure of Alex Neil.<br>The hosts dominated the early proceedings, with Jonny Howson and Alex Pritchard both being denied by Barnsley keeper Adam Davies.<br>After Hammill had a goal ruled out for a clear offside at the other end, winger Murphy gave the Canaries a deserved lead moments before the break when, having being picked out by Cameron Jerome, he drilled a shot low and into the corner of the net.<br>Hammill was unlucky to not get a strong enough flick on Andy Yiadom's cross to make it 1-1 after the restart and MacDonald saw a close-range effort well saved by Michael McGovern from the resulting corner.<br>But, after Steven Naismith fired over for the Canaries with just the keeper to beat, they doubled their lead in fortunate circumstances as an effort from Murphy deflected off MacDonald into the net.<br>Jerome and Howson then went close to adding a third as Norwich coasted to three points.<br>Norwich caretaker manager Alan Irvine:<br>"I was asked to take charge for two games and I have done that. I haven't heard anything more about what happens going forward, but I should imagine I will be speaking to someone soon to find out what happens next week.<br>"If that is to be my last game in charge it was a good way to finish - and the win certainly makes it interesting as far as the play-offs are concerned.<br>"Being five points behind sounds a lot better than being eight points behind - and as I said last week there are still plenty of points to play for."<br>Barnsley manager Paul Heckingbottom:<br>"The take-away message from that game is hit the target, score goals.<br>"There were plenty of positives to take away from it, but if you are going to get anything in this league you have got to be clinical in front of goal.<br>"It's frustrating, but there is still plenty to play for. We will keep striving to get that perfect performance and obviously want to win as many games as possible between now and the end of the season."<br>Match ends, Norwich City 2, Barnsley 0.<br>Second Half ends, Norwich City 2, Barnsley 0.<br>Hand ball by Nélson Oliveira (Norwich City).<br>Attempt missed. Ryan Kent (Barnsley) right footed shot from the centre of the box is close, but misses to the left. Assisted by Ryan Hedges with a cross.<br>Attempt saved. Nélson Oliveira (Norwich City) left footed shot from outside the box is saved in the centre of the goal.<br>Alex Pritchard (Norwich City) wins a free kick in the attacking half.<br>Foul by Alex Mowatt (Barnsley).<br>Corner, Barnsley. Conceded by Jonny Howson.<br>Foul by Graham Dorrans (Norwich City).<br>Matthew James (Barnsley) wins a free kick in the defensive half.<br>Attempt missed. Tom Bradshaw (Barnsley) left footed shot from the centre of the box is too high. Assisted by Gethin Jones with a cross.<br>Attempt missed. Steven Naismith (Norwich City) right footed shot from the right side of the box misses to the left. Assisted by Alex Pritchard.<br>Corner, Norwich City. Conceded by Angus MacDonald.<br>Attempt blocked. Jonny Howson (Norwich City) right footed shot from the right side of the box is blocked. Assisted by Graham Dorrans with a through ball.<br>Substitution, Norwich City. Graham Dorrans replaces Jacob Murphy.<br>Substitution, Norwich City. Nélson Oliveira replaces Cameron Jerome.<br>Substitution, Barnsley. Ryan Hedges replaces Adam Hammill.<br>Attempt missed. Ryan Kent (Barnsley) left footed shot from the centre of the box is high and wide to the left. Assisted by Matthew James with a cross.<br>Attempt saved. Cameron Jerome (Norwich City) right footed shot from the centre of the box is saved in the bottom right corner. Assisted by Jacob Murphy with a through ball.<br>Substitution, Barnsley. Alex Mowatt replaces Marley Watkins.<br>Corner, Barnsley. Conceded by Ivo Pinto.<br>Corner, Barnsley. Conceded by Russell Martin.<br>Attempt blocked. Tom Bradshaw (Barnsley) right footed shot from the right side of the box is blocked. Assisted by Ryan Kent.<br>Own Goal by Angus MacDonald, Barnsley. Norwich City 2, Barnsley 0.<br>Attempt saved. Jacob Murphy (Norwich City) right footed shot from the centre of the box is saved in the bottom right corner. Assisted by Alex Pritchard.<br>Attempt saved. Steven Naismith (Norwich City) left footed shot from the centre of the box is saved in the bottom left corner. Assisted by Steven Whittaker with a cross.<br>Ivo Pinto (Norwich City) wins a free kick in the defensive half.<br>Foul by Adam Hammill (Barnsley).<br>Attempt saved. Ryan Kent (Barnsley) right footed shot from outside the box is saved in the centre of the goal. Assisted by Marley Watkins.<br>Attempt missed. Josh Scowen (Barnsley) right footed shot from outside the box is high and wide to the right. Assisted by Adam Hammill.<br>Jacob Murphy (Norwich City) wins a free kick in the attacking half.<br>Foul by Angus MacDonald (Barnsley).<br>Ryan Bennett (Norwich City) wins a free kick in the defensive half.<br>Foul by Marc Roberts (Barnsley).<br>Ivo Pinto (Norwich City) is shown the yellow card for a bad foul.<br>Foul by Ivo Pinto (Norwich City).<br>Ryan Kent (Barnsley) wins a free kick in the attacking half.<br>Foul by Ryan Bennett (Norwich City).<br>Tom Bradshaw (Barnsley) wins a free kick in the attacking half.<br>Attempt missed. Steven Naismith (Norwich City) left footed shot from the left side of the box is too high. Assisted by Alex Pritchard.</code> | <code>Norwich City kept their Championship play-off hopes alive by beating Barnsley at Carrow Road.</code> |
402
+ | <code>Political reporter Samantha Maiden said the offensive text, which also contained strong language, was intended for disgraced ex-minister Jamie Briggs.<br>She said Mr Dutton apologised for the message about her article referring to Mr Briggs' recent resignation.<br>The BBC has approached Mr Dutton's office for comment.<br>He reportedly told News Corp in a statement he is expecting a "tough time" in Ms Maiden's next article.<br>"Sam and I have exchanged some robust language over the years so we had a laugh after this and I apologised to her straightaway, which she took in good faith," Mr Dutton was quoted as saying.<br>Former Cities Minister Jamie Briggs resigned last week following a complaint from a female public servant over his alleged conduct during a night out in Hong Kong.</code> | <code>Australia's Immigration Minister Peter Dutton has reportedly apologised for mistakenly sending an SMS to a journalist, calling her a "mad witch".</code> |
403
+ | <code>Demonstrators have moved around several sites since April to highlight a crisis in temporary housing.<br>The council's lawyer told the court "trespass, highways and planning laws" were the grounds for the case.<br>The cost to the council in terms of additional policing, security and legal costs has exceeded £100,000, he added.<br>Ahead of the hearing, tents were set up and a banner reading "The homeless resistance" was hung outside Manchester Civil Justice Centre.<br>'Grave and serious'<br>Protesters said they hoped to be offered "permanent, suitable accommodation".<br>Some had earlier refused temporary accommodation offered by the council because they said it was "not suitable" and they felt unsafe.<br>The council said it had engaged with the protestors and had offered them support, but it could not accept anti-social behaviour and disruption to residents and businesses.<br>Councillor Nigel Murphy added the exclusion order was "designed to prevent the recurrence of camps and not targeted at individual rough sleepers".<br>He said the council would work with police and court bailiffs to "regain possession" of areas taken over by camps in St Ann's Square and Castlefield as soon as possible.<br>John Clegg, from Unison's community branch, said there was a lack of social housing in Manchester.<br>He added: "There is a large amount of money for building private flats, more hotels are going up all the time, but there are no plans to build any social housing. That's wrong. That's absolutely wrong."<br>"In our view an injunction is a form of gating, and sending out a message that poor people are not wanted and should not be coming in to the city centre."</code> | <code>A Manchester City Council application for an injunction to stop the setting up of homeless camps in the city centre has been granted.</code> |
404
+ * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
405
+ ```json
406
+ {
407
+ "scale": 20.0,
408
+ "similarity_fct": "cos_sim"
409
+ }
410
+ ```
411
+
412
+ #### compression-pairs
413
+
414
+ * Dataset: [compression-pairs](https://huggingface.co/datasets/sentence-transformers/sentence-compression) at [605bc91](https://huggingface.co/datasets/sentence-transformers/sentence-compression/tree/605bc91d95631895ba25b6eda51a3cb596976c90)
415
+ * Size: 180,000 training samples
416
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
417
+ * Approximate statistics based on the first 1000 samples:
418
+ | | sentence1 | sentence2 |
419
+ |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
420
+ | type | string | string |
421
+ | details | <ul><li>min: 10 tokens</li><li>mean: 31.89 tokens</li><li>max: 125 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 10.21 tokens</li><li>max: 28 tokens</li></ul> |
422
+ * Samples:
423
+ | sentence1 | sentence2 |
424
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------|
425
+ | <code>The USHL completed an expansion draft on Monday as 10 players who were on the rosters of USHL teams during the 2009-10 season were selected by the League's two newest entries, the Muskegon Lumberjacks and Dubuque Fighting Saints.</code> | <code>USHL completes expansion draft</code> |
426
+ | <code>Major League Baseball Commissioner Bud Selig will be speaking at St. Norbert College next month.</code> | <code>Bud Selig to speak at St. Norbert College</code> |
427
+ | <code>It's fresh cherry time in Michigan and the best time to enjoy this delicious and nutritious fruit.</code> | <code>It's cherry time</code> |
428
+ * Loss: [<code>MultipleNegativesSymmetricRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) with these parameters:
429
+ ```json
430
+ {
431
+ "scale": 20.0,
432
+ "similarity_fct": "cos_sim"
433
+ }
434
+ ```
435
+
436
+ ### Evaluation Datasets
437
+
438
+ #### nli-pairs
439
+
440
+ * Dataset: [nli-pairs](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
441
+ * Size: 6,808 evaluation samples
442
+ * Columns: <code>sentence1</code> and <code>sentence2</code>
443
+ * Approximate statistics based on the first 1000 samples:
444
+ | | sentence1 | sentence2 |
445
+ |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
446
+ | type | string | string |
447
+ | details | <ul><li>min: 5 tokens</li><li>mean: 17.64 tokens</li><li>max: 63 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 9.67 tokens</li><li>max: 29 tokens</li></ul> |
448
+ * Samples:
449
+ | sentence1 | sentence2 |
450
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------|
451
+ | <code>Two women are embracing while holding to go packages.</code> | <code>Two woman are holding packages.</code> |
452
+ | <code>Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.</code> | <code>Two kids in numbered jerseys wash their hands.</code> |
453
+ | <code>A man selling donuts to a customer during a world exhibition event held in the city of Angeles</code> | <code>A man selling donuts to a customer.</code> |
454
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
455
+ ```json
456
+ {
457
+ "scale": 20.0,
458
+ "similarity_fct": "cos_sim"
459
+ }
460
+ ```
461
+
462
+ #### qnli-contrastive
463
+
464
+ * Dataset: [qnli-contrastive](https://huggingface.co/datasets/nyu-mll/glue) at [bcdcba7](https://huggingface.co/datasets/nyu-mll/glue/tree/bcdcba79d07bc864c1c254ccfcedcce55bcc9a8c)
465
+ * Size: 5,463 evaluation samples
466
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
467
+ * Approximate statistics based on the first 1000 samples:
468
+ | | sentence1 | sentence2 | label |
469
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------|
470
+ | type | string | string | int |
471
+ | details | <ul><li>min: 6 tokens</li><li>mean: 14.13 tokens</li><li>max: 36 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 36.58 tokens</li><li>max: 225 tokens</li></ul> | <ul><li>0: 100.00%</li></ul> |
472
+ * Samples:
473
+ | sentence1 | sentence2 | label |
474
+ |:--------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
475
+ | <code>What came into force after the new constitution was herald?</code> | <code>As of that day, the new constitution heralding the Second Republic came into force.</code> | <code>0</code> |
476
+ | <code>What is the first major city in the stream of the Rhine?</code> | <code>The most important tributaries in this area are the Ill below of Strasbourg, the Neckar in Mannheim and the Main across from Mainz.</code> | <code>0</code> |
477
+ | <code>What is the minimum required if you want to teach in Canada?</code> | <code>In most provinces a second Bachelor's Degree such as a Bachelor of Education is required to become a qualified teacher.</code> | <code>0</code> |
478
+ * Loss: [<code>OnlineContrastiveLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#onlinecontrastiveloss)
479
+
480
+ ### Training Hyperparameters
481
+ #### Non-Default Hyperparameters
482
+
483
+ - `eval_strategy`: steps
484
+ - `per_device_train_batch_size`: 94
485
+ - `per_device_eval_batch_size`: 32
486
+ - `learning_rate`: 2e-05
487
+ - `weight_decay`: 1e-10
488
+ - `num_train_epochs`: 2
489
+ - `lr_scheduler_type`: cosine
490
+ - `warmup_ratio`: 0.33
491
+ - `save_safetensors`: False
492
+ - `fp16`: True
493
+ - `push_to_hub`: True
494
+ - `hub_model_id`: bobox/DeBERTaV3-small-GeneralSentenceTransformer-checkpoints-tmp
495
+ - `hub_strategy`: checkpoint
496
+ - `batch_sampler`: no_duplicates
497
+
498
+ #### All Hyperparameters
499
+ <details><summary>Click to expand</summary>
500
+
501
+ - `overwrite_output_dir`: False
502
+ - `do_predict`: False
503
+ - `eval_strategy`: steps
504
+ - `prediction_loss_only`: True
505
+ - `per_device_train_batch_size`: 94
506
+ - `per_device_eval_batch_size`: 32
507
+ - `per_gpu_train_batch_size`: None
508
+ - `per_gpu_eval_batch_size`: None
509
+ - `gradient_accumulation_steps`: 1
510
+ - `eval_accumulation_steps`: None
511
+ - `learning_rate`: 2e-05
512
+ - `weight_decay`: 1e-10
513
+ - `adam_beta1`: 0.9
514
+ - `adam_beta2`: 0.999
515
+ - `adam_epsilon`: 1e-08
516
+ - `max_grad_norm`: 1.0
517
+ - `num_train_epochs`: 2
518
+ - `max_steps`: -1
519
+ - `lr_scheduler_type`: cosine
520
+ - `lr_scheduler_kwargs`: {}
521
+ - `warmup_ratio`: 0.33
522
+ - `warmup_steps`: 0
523
+ - `log_level`: passive
524
+ - `log_level_replica`: warning
525
+ - `log_on_each_node`: True
526
+ - `logging_nan_inf_filter`: True
527
+ - `save_safetensors`: False
528
+ - `save_on_each_node`: False
529
+ - `save_only_model`: False
530
+ - `restore_callback_states_from_checkpoint`: False
531
+ - `no_cuda`: False
532
+ - `use_cpu`: False
533
+ - `use_mps_device`: False
534
+ - `seed`: 42
535
+ - `data_seed`: None
536
+ - `jit_mode_eval`: False
537
+ - `use_ipex`: False
538
+ - `bf16`: False
539
+ - `fp16`: True
540
+ - `fp16_opt_level`: O1
541
+ - `half_precision_backend`: auto
542
+ - `bf16_full_eval`: False
543
+ - `fp16_full_eval`: False
544
+ - `tf32`: None
545
+ - `local_rank`: 0
546
+ - `ddp_backend`: None
547
+ - `tpu_num_cores`: None
548
+ - `tpu_metrics_debug`: False
549
+ - `debug`: []
550
+ - `dataloader_drop_last`: False
551
+ - `dataloader_num_workers`: 0
552
+ - `dataloader_prefetch_factor`: None
553
+ - `past_index`: -1
554
+ - `disable_tqdm`: False
555
+ - `remove_unused_columns`: True
556
+ - `label_names`: None
557
+ - `load_best_model_at_end`: False
558
+ - `ignore_data_skip`: False
559
+ - `fsdp`: []
560
+ - `fsdp_min_num_params`: 0
561
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
562
+ - `fsdp_transformer_layer_cls_to_wrap`: None
563
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
564
+ - `deepspeed`: None
565
+ - `label_smoothing_factor`: 0.0
566
+ - `optim`: adamw_torch
567
+ - `optim_args`: None
568
+ - `adafactor`: False
569
+ - `group_by_length`: False
570
+ - `length_column_name`: length
571
+ - `ddp_find_unused_parameters`: None
572
+ - `ddp_bucket_cap_mb`: None
573
+ - `ddp_broadcast_buffers`: False
574
+ - `dataloader_pin_memory`: True
575
+ - `dataloader_persistent_workers`: False
576
+ - `skip_memory_metrics`: True
577
+ - `use_legacy_prediction_loop`: False
578
+ - `push_to_hub`: True
579
+ - `resume_from_checkpoint`: None
580
+ - `hub_model_id`: bobox/DeBERTaV3-small-GeneralSentenceTransformer-checkpoints-tmp
581
+ - `hub_strategy`: checkpoint
582
+ - `hub_private_repo`: False
583
+ - `hub_always_push`: False
584
+ - `gradient_checkpointing`: False
585
+ - `gradient_checkpointing_kwargs`: None
586
+ - `include_inputs_for_metrics`: False
587
+ - `eval_do_concat_batches`: True
588
+ - `fp16_backend`: auto
589
+ - `push_to_hub_model_id`: None
590
+ - `push_to_hub_organization`: None
591
+ - `mp_parameters`:
592
+ - `auto_find_batch_size`: False
593
+ - `full_determinism`: False
594
+ - `torchdynamo`: None
595
+ - `ray_scope`: last
596
+ - `ddp_timeout`: 1800
597
+ - `torch_compile`: False
598
+ - `torch_compile_backend`: None
599
+ - `torch_compile_mode`: None
600
+ - `dispatch_batches`: None
601
+ - `split_batches`: None
602
+ - `include_tokens_per_second`: False
603
+ - `include_num_input_tokens_seen`: False
604
+ - `neftune_noise_alpha`: None
605
+ - `optim_target_modules`: None
606
+ - `batch_eval_metrics`: False
607
+ - `batch_sampler`: no_duplicates
608
+ - `multi_dataset_batch_sampler`: proportional
609
+
610
+ </details>
611
+
612
+ ### Training Logs
613
+ | Epoch | Step | Training Loss | qnli-contrastive loss | nli-pairs loss | sts-test_spearman_cosine |
614
+ |:------:|:-----:|:-------------:|:---------------------:|:--------------:|:------------------------:|
615
+ | None | 0 | - | 20.1737 | 4.0959 | - |
616
+ | 0.1001 | 734 | 4.796 | - | - | - |
617
+ | 0.2001 | 1468 | 1.3015 | 0.0358 | 0.9115 | - |
618
+ | 0.3002 | 2202 | 0.89 | - | - | - |
619
+ | 0.4002 | 2936 | 0.716 | 0.0168 | 0.5944 | - |
620
+ | 0.5003 | 3670 | 0.6365 | - | - | - |
621
+ | 0.6003 | 4404 | 0.5883 | 0.0164 | 0.4975 | - |
622
+ | 0.7004 | 5138 | 0.5192 | - | - | - |
623
+ | 0.8004 | 5872 | 0.4961 | 0.0288 | 0.4450 | - |
624
+ | 0.9005 | 6606 | 0.6035 | - | - | - |
625
+ | 1.0005 | 7340 | 0.4733 | 0.0110 | 0.4215 | - |
626
+ | 1.1006 | 8074 | 0.4002 | - | - | - |
627
+ | 1.2007 | 8808 | 0.3929 | 0.0454 | 0.3796 | - |
628
+ | 1.3007 | 9542 | 0.3826 | - | - | - |
629
+ | 1.4008 | 10276 | 0.3522 | 0.0178 | 0.3714 | - |
630
+ | 1.5008 | 11010 | 0.3627 | - | - | - |
631
+ | 1.6009 | 11744 | 0.3553 | 0.0257 | 0.3629 | - |
632
+ | 1.7009 | 12478 | 0.3406 | - | - | - |
633
+ | 1.8010 | 13212 | 0.3288 | 0.0289 | 0.3575 | - |
634
+ | 1.9010 | 13946 | 0.4563 | - | - | - |
635
+ | 2.0 | 14672 | - | 0.0320 | 0.3551 | 0.7930 |
636
+
637
+
638
+ ### Framework Versions
639
+ - Python: 3.10.12
640
+ - Sentence Transformers: 3.0.1
641
+ - Transformers: 4.41.2
642
+ - PyTorch: 2.3.0+cu121
643
+ - Accelerate: 0.31.0
644
+ - Datasets: 2.20.0
645
+ - Tokenizers: 0.19.1
646
+
647
+ ## Citation
648
+
649
+ ### BibTeX
650
+
651
+ #### Sentence Transformers
652
+ ```bibtex
653
+ @inproceedings{reimers-2019-sentence-bert,
654
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
655
+ author = "Reimers, Nils and Gurevych, Iryna",
656
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
657
+ month = "11",
658
+ year = "2019",
659
+ publisher = "Association for Computational Linguistics",
660
+ url = "https://arxiv.org/abs/1908.10084",
661
+ }
662
+ ```
663
+
664
+ #### MultipleNegativesRankingLoss
665
+ ```bibtex
666
+ @misc{henderson2017efficient,
667
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
668
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
669
+ year={2017},
670
+ eprint={1705.00652},
671
+ archivePrefix={arXiv},
672
+ primaryClass={cs.CL}
673
+ }
674
+ ```
675
+
676
+ #### CoSENTLoss
677
+ ```bibtex
678
+ @online{kexuefm-8847,
679
+ title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
680
+ author={Su Jianlin},
681
+ year={2022},
682
+ month={Jan},
683
+ url={https://kexue.fm/archives/8847},
684
+ }
685
+ ```
686
+
687
+ #### GISTEmbedLoss
688
+ ```bibtex
689
+ @misc{solatorio2024gistembed,
690
+ title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
691
+ author={Aivin V. Solatorio},
692
+ year={2024},
693
+ eprint={2402.16829},
694
+ archivePrefix={arXiv},
695
+ primaryClass={cs.LG}
696
+ }
697
+ ```
698
+
699
+ <!--
700
+ ## Glossary
701
+
702
+ *Clearly define terms in order to be accessible across audiences.*
703
+ -->
704
+
705
+ <!--
706
+ ## Model Card Authors
707
+
708
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
709
+ -->
710
+
711
+ <!--
712
+ ## Model Card Contact
713
+
714
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
715
+ -->
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/deberta-v3-small",
3
+ "architectures": [
4
+ "DebertaV2Model"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 768,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 3072,
12
+ "layer_norm_eps": 1e-07,
13
+ "max_position_embeddings": 512,
14
+ "max_relative_positions": -1,
15
+ "model_type": "deberta-v2",
16
+ "norm_rel_ebd": "layer_norm",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "pooler_dropout": 0,
21
+ "pooler_hidden_act": "gelu",
22
+ "pooler_hidden_size": 768,
23
+ "pos_att_type": [
24
+ "p2c",
25
+ "c2p"
26
+ ],
27
+ "position_biased_input": false,
28
+ "position_buckets": 256,
29
+ "relative_attention": true,
30
+ "share_att_key": true,
31
+ "torch_dtype": "float32",
32
+ "transformers_version": "4.41.2",
33
+ "type_vocab_size": 0,
34
+ "vocab_size": 128100
35
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.3.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6b8f13689928c18ece3856866a5eabeea8661d8106e9cb8141da0943dbf28da
3
+ size 565251810
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": "[MASK]",
6
+ "pad_token": "[PAD]",
7
+ "sep_token": "[SEP]",
8
+ "unk_token": {
9
+ "content": "[UNK]",
10
+ "lstrip": false,
11
+ "normalized": true,
12
+ "rstrip": false,
13
+ "single_word": false
14
+ }
15
+ }
spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[CLS]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "sp_model_kwargs": {},
54
+ "split_by_punct": false,
55
+ "tokenizer_class": "DebertaV2Tokenizer",
56
+ "unk_token": "[UNK]",
57
+ "vocab_type": "spm"
58
+ }