schawla2 commited on
Commit
dd050fe
·
verified ·
1 Parent(s): b4705a3

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,760 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:6300
11
+ - loss:MatryoshkaLoss
12
+ - loss:MultipleNegativesRankingLoss
13
+ base_model: intfloat/e5-large-unsupervised
14
+ widget:
15
+ - source_sentence: What are the key components of the transparency provisions included
16
+ in the Consolidated Appropriations Act of 2021 regarding healthcare?
17
+ sentences:
18
+ - The report includes information on legal proceedings under 'Note 13 — Commitments
19
+ and Contingencies — Litigation and Other Legal Matters' which is a part of the
20
+ consolidated financial statements
21
+ - The Consolidated Appropriations Act of 2021 was signed into law in December 2020
22
+ and contains further transparency provisions requiring group health plans and
23
+ health insurance issuers to report certain prescription drug costs, overall spending
24
+ on health services and prescription drugs, and information about premiums and
25
+ the impact of rebates and other remuneration on premiums and out-of-pocket costs
26
+ to the Tri-Departments.
27
+ - In 2023, the company recorded other operating charges of $1,951 million.
28
+ - source_sentence: What technology does the Tax Advisor use and for what purpose in
29
+ Intuit's offerings?
30
+ sentences:
31
+ - In 2023, Goldman Sachs' investments in funds at NAV primarily included firm-sponsored
32
+ private equity, credit, real estate, and hedge funds. These funds are involved
33
+ in various types of investments such as leveraged buyouts, recapitalizations,
34
+ growth investments, and distressed investments for private equity, while credit
35
+ funds are focused on providing private high-yield capital for leveraged and management
36
+ buyout transactions. Real estate funds invest globally in real estate assets,
37
+ and hedge funds adopt a fundamental bottom-up investment approach.
38
+ - Using AI technologies, our Tax Advisor offering leverages information generated
39
+ from our ProConnect Tax Online and Lacerte offerings to enable year-round tax
40
+ planning services and communicate tax savings strategies to clients.
41
+ - '''Note 13 — Commitments and Contingencies'' provides details about litigation
42
+ and other legal matters in an Annual Report on Form 10-K.'
43
+ - source_sentence: What was the net revenue for the Data Center segment in 2023?
44
+ sentences:
45
+ - Data Center net revenue of $6.5 billion in 2023 increased by 7%, compared to net
46
+ revenue of $6.0 billion in 2022.
47
+ - Under its Class 2 insurance license, Caterpillar Insurance Co. Ltd. insures its
48
+ parent and affiliates for general liability, property, auto liability and cargo.
49
+ It also provides reinsurance to CaterThe pillar Insurance Company under a quota
50
+ share reinsurance agreement for its contractual liability and contractors’ equipment
51
+ programs in the United States.
52
+ - Schwab’s funding of these remaining commitments is dependent upon the occurrence
53
+ of certain conditions, and Schwab expects to pay substantially all of these commitments
54
+ between 2024 and 2027.
55
+ - source_sentence: What are the three principles of liquidity risk management at Goldman
56
+ Sachs?
57
+ sentences:
58
+ - The Company determines if an arrangement is a lease at inception and classifies
59
+ its leases at commencement. Operating leases are included in operating lease right-of-use
60
+ ("ROU") assets and current and noncurrent operating lease liabilities on the Company’s
61
+ consolidated balance sheets.
62
+ - Garmin Ltd. reported a net income of $1,289,636 for the fiscal year ended December
63
+ 30, 2023.
64
+ - 'Goldman Sachs manages liquidity risk based on three principles: 1) hold sufficient
65
+ excess liquidity in the form of GCLA to cover outflows during a stressed period,
66
+ 2) maintain appropriate Asset-Liability Management, and 3) maintain a viable Contingency
67
+ Funding Plan.'
68
+ - source_sentence: What was the total cost and expenses reported by Berkshire Hathaway
69
+ for the year ended December 31, 2023?
70
+ sentences:
71
+ - Total costs and expenses | | 321,144 | | | 266,484 | | | 243,752
72
+ - Qulipta (atogepant) is a calcitonin gene-related peptide receptor antagonist indicated
73
+ for the preventive treatment of episodic and chronic migraine in adults. Qulipta
74
+ is commercialized in the United States and Canada and is approved in the European
75
+ Union under the brand name Aquipta.
76
+ - Item 3 'Legal Proceedings' is integrated by reference to other parts including
77
+ Note 22 — 'Environmental and legal matters' and Part II, Item 8.
78
+ pipeline_tag: sentence-similarity
79
+ library_name: sentence-transformers
80
+ metrics:
81
+ - cosine_accuracy@1
82
+ - cosine_accuracy@3
83
+ - cosine_accuracy@5
84
+ - cosine_accuracy@10
85
+ - cosine_precision@1
86
+ - cosine_precision@3
87
+ - cosine_precision@5
88
+ - cosine_precision@10
89
+ - cosine_recall@1
90
+ - cosine_recall@3
91
+ - cosine_recall@5
92
+ - cosine_recall@10
93
+ - cosine_ndcg@10
94
+ - cosine_mrr@10
95
+ - cosine_map@100
96
+ model-index:
97
+ - name: E5 unsupervised Financial Matryoshka
98
+ results:
99
+ - task:
100
+ type: information-retrieval
101
+ name: Information Retrieval
102
+ dataset:
103
+ name: dim 768
104
+ type: dim_768
105
+ metrics:
106
+ - type: cosine_accuracy@1
107
+ value: 0.7271428571428571
108
+ name: Cosine Accuracy@1
109
+ - type: cosine_accuracy@3
110
+ value: 0.85
111
+ name: Cosine Accuracy@3
112
+ - type: cosine_accuracy@5
113
+ value: 0.8785714285714286
114
+ name: Cosine Accuracy@5
115
+ - type: cosine_accuracy@10
116
+ value: 0.9114285714285715
117
+ name: Cosine Accuracy@10
118
+ - type: cosine_precision@1
119
+ value: 0.7271428571428571
120
+ name: Cosine Precision@1
121
+ - type: cosine_precision@3
122
+ value: 0.2833333333333333
123
+ name: Cosine Precision@3
124
+ - type: cosine_precision@5
125
+ value: 0.17571428571428568
126
+ name: Cosine Precision@5
127
+ - type: cosine_precision@10
128
+ value: 0.09114285714285714
129
+ name: Cosine Precision@10
130
+ - type: cosine_recall@1
131
+ value: 0.7271428571428571
132
+ name: Cosine Recall@1
133
+ - type: cosine_recall@3
134
+ value: 0.85
135
+ name: Cosine Recall@3
136
+ - type: cosine_recall@5
137
+ value: 0.8785714285714286
138
+ name: Cosine Recall@5
139
+ - type: cosine_recall@10
140
+ value: 0.9114285714285715
141
+ name: Cosine Recall@10
142
+ - type: cosine_ndcg@10
143
+ value: 0.822517236613446
144
+ name: Cosine Ndcg@10
145
+ - type: cosine_mrr@10
146
+ value: 0.7936921768707483
147
+ name: Cosine Mrr@10
148
+ - type: cosine_map@100
149
+ value: 0.7973883589026711
150
+ name: Cosine Map@100
151
+ - task:
152
+ type: information-retrieval
153
+ name: Information Retrieval
154
+ dataset:
155
+ name: dim 512
156
+ type: dim_512
157
+ metrics:
158
+ - type: cosine_accuracy@1
159
+ value: 0.7271428571428571
160
+ name: Cosine Accuracy@1
161
+ - type: cosine_accuracy@3
162
+ value: 0.8457142857142858
163
+ name: Cosine Accuracy@3
164
+ - type: cosine_accuracy@5
165
+ value: 0.88
166
+ name: Cosine Accuracy@5
167
+ - type: cosine_accuracy@10
168
+ value: 0.9128571428571428
169
+ name: Cosine Accuracy@10
170
+ - type: cosine_precision@1
171
+ value: 0.7271428571428571
172
+ name: Cosine Precision@1
173
+ - type: cosine_precision@3
174
+ value: 0.28190476190476194
175
+ name: Cosine Precision@3
176
+ - type: cosine_precision@5
177
+ value: 0.176
178
+ name: Cosine Precision@5
179
+ - type: cosine_precision@10
180
+ value: 0.09128571428571429
181
+ name: Cosine Precision@10
182
+ - type: cosine_recall@1
183
+ value: 0.7271428571428571
184
+ name: Cosine Recall@1
185
+ - type: cosine_recall@3
186
+ value: 0.8457142857142858
187
+ name: Cosine Recall@3
188
+ - type: cosine_recall@5
189
+ value: 0.88
190
+ name: Cosine Recall@5
191
+ - type: cosine_recall@10
192
+ value: 0.9128571428571428
193
+ name: Cosine Recall@10
194
+ - type: cosine_ndcg@10
195
+ value: 0.8223709830528422
196
+ name: Cosine Ndcg@10
197
+ - type: cosine_mrr@10
198
+ value: 0.793145691609977
199
+ name: Cosine Mrr@10
200
+ - type: cosine_map@100
201
+ value: 0.7966990460475021
202
+ name: Cosine Map@100
203
+ - task:
204
+ type: information-retrieval
205
+ name: Information Retrieval
206
+ dataset:
207
+ name: dim 256
208
+ type: dim_256
209
+ metrics:
210
+ - type: cosine_accuracy@1
211
+ value: 0.72
212
+ name: Cosine Accuracy@1
213
+ - type: cosine_accuracy@3
214
+ value: 0.8457142857142858
215
+ name: Cosine Accuracy@3
216
+ - type: cosine_accuracy@5
217
+ value: 0.8714285714285714
218
+ name: Cosine Accuracy@5
219
+ - type: cosine_accuracy@10
220
+ value: 0.9057142857142857
221
+ name: Cosine Accuracy@10
222
+ - type: cosine_precision@1
223
+ value: 0.72
224
+ name: Cosine Precision@1
225
+ - type: cosine_precision@3
226
+ value: 0.28190476190476194
227
+ name: Cosine Precision@3
228
+ - type: cosine_precision@5
229
+ value: 0.17428571428571424
230
+ name: Cosine Precision@5
231
+ - type: cosine_precision@10
232
+ value: 0.09057142857142855
233
+ name: Cosine Precision@10
234
+ - type: cosine_recall@1
235
+ value: 0.72
236
+ name: Cosine Recall@1
237
+ - type: cosine_recall@3
238
+ value: 0.8457142857142858
239
+ name: Cosine Recall@3
240
+ - type: cosine_recall@5
241
+ value: 0.8714285714285714
242
+ name: Cosine Recall@5
243
+ - type: cosine_recall@10
244
+ value: 0.9057142857142857
245
+ name: Cosine Recall@10
246
+ - type: cosine_ndcg@10
247
+ value: 0.8159991941699124
248
+ name: Cosine Ndcg@10
249
+ - type: cosine_mrr@10
250
+ value: 0.7869370748299319
251
+ name: Cosine Mrr@10
252
+ - type: cosine_map@100
253
+ value: 0.7906967878713818
254
+ name: Cosine Map@100
255
+ - task:
256
+ type: information-retrieval
257
+ name: Information Retrieval
258
+ dataset:
259
+ name: dim 128
260
+ type: dim_128
261
+ metrics:
262
+ - type: cosine_accuracy@1
263
+ value: 0.7085714285714285
264
+ name: Cosine Accuracy@1
265
+ - type: cosine_accuracy@3
266
+ value: 0.8285714285714286
267
+ name: Cosine Accuracy@3
268
+ - type: cosine_accuracy@5
269
+ value: 0.8728571428571429
270
+ name: Cosine Accuracy@5
271
+ - type: cosine_accuracy@10
272
+ value: 0.8985714285714286
273
+ name: Cosine Accuracy@10
274
+ - type: cosine_precision@1
275
+ value: 0.7085714285714285
276
+ name: Cosine Precision@1
277
+ - type: cosine_precision@3
278
+ value: 0.2761904761904762
279
+ name: Cosine Precision@3
280
+ - type: cosine_precision@5
281
+ value: 0.17457142857142854
282
+ name: Cosine Precision@5
283
+ - type: cosine_precision@10
284
+ value: 0.08985714285714284
285
+ name: Cosine Precision@10
286
+ - type: cosine_recall@1
287
+ value: 0.7085714285714285
288
+ name: Cosine Recall@1
289
+ - type: cosine_recall@3
290
+ value: 0.8285714285714286
291
+ name: Cosine Recall@3
292
+ - type: cosine_recall@5
293
+ value: 0.8728571428571429
294
+ name: Cosine Recall@5
295
+ - type: cosine_recall@10
296
+ value: 0.8985714285714286
297
+ name: Cosine Recall@10
298
+ - type: cosine_ndcg@10
299
+ value: 0.8073517667504667
300
+ name: Cosine Ndcg@10
301
+ - type: cosine_mrr@10
302
+ value: 0.7777108843537414
303
+ name: Cosine Mrr@10
304
+ - type: cosine_map@100
305
+ value: 0.7815591417851651
306
+ name: Cosine Map@100
307
+ - task:
308
+ type: information-retrieval
309
+ name: Information Retrieval
310
+ dataset:
311
+ name: dim 64
312
+ type: dim_64
313
+ metrics:
314
+ - type: cosine_accuracy@1
315
+ value: 0.6757142857142857
316
+ name: Cosine Accuracy@1
317
+ - type: cosine_accuracy@3
318
+ value: 0.8185714285714286
319
+ name: Cosine Accuracy@3
320
+ - type: cosine_accuracy@5
321
+ value: 0.8457142857142858
322
+ name: Cosine Accuracy@5
323
+ - type: cosine_accuracy@10
324
+ value: 0.8842857142857142
325
+ name: Cosine Accuracy@10
326
+ - type: cosine_precision@1
327
+ value: 0.6757142857142857
328
+ name: Cosine Precision@1
329
+ - type: cosine_precision@3
330
+ value: 0.27285714285714285
331
+ name: Cosine Precision@3
332
+ - type: cosine_precision@5
333
+ value: 0.16914285714285712
334
+ name: Cosine Precision@5
335
+ - type: cosine_precision@10
336
+ value: 0.08842857142857141
337
+ name: Cosine Precision@10
338
+ - type: cosine_recall@1
339
+ value: 0.6757142857142857
340
+ name: Cosine Recall@1
341
+ - type: cosine_recall@3
342
+ value: 0.8185714285714286
343
+ name: Cosine Recall@3
344
+ - type: cosine_recall@5
345
+ value: 0.8457142857142858
346
+ name: Cosine Recall@5
347
+ - type: cosine_recall@10
348
+ value: 0.8842857142857142
349
+ name: Cosine Recall@10
350
+ - type: cosine_ndcg@10
351
+ value: 0.7861731335824387
352
+ name: Cosine Ndcg@10
353
+ - type: cosine_mrr@10
354
+ value: 0.7542681405895693
355
+ name: Cosine Mrr@10
356
+ - type: cosine_map@100
357
+ value: 0.7588497811523153
358
+ name: Cosine Map@100
359
+ ---
360
+
361
+ # E5 unsupervised Financial Matryoshka
362
+
363
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/e5-large-unsupervised](https://huggingface.co/intfloat/e5-large-unsupervised) on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
364
+
365
+ ## Model Details
366
+
367
+ ### Model Description
368
+ - **Model Type:** Sentence Transformer
369
+ - **Base model:** [intfloat/e5-large-unsupervised](https://huggingface.co/intfloat/e5-large-unsupervised) <!-- at revision 15af9288f69a6291f37bfb89b47e71abc747b206 -->
370
+ - **Maximum Sequence Length:** 512 tokens
371
+ - **Output Dimensionality:** 1024 dimensions
372
+ - **Similarity Function:** Cosine Similarity
373
+ - **Training Dataset:**
374
+ - json
375
+ - **Language:** en
376
+ - **License:** apache-2.0
377
+
378
+ ### Model Sources
379
+
380
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
381
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
382
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
383
+
384
+ ### Full Model Architecture
385
+
386
+ ```
387
+ SentenceTransformer(
388
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
389
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
390
+ (2): Normalize()
391
+ )
392
+ ```
393
+
394
+ ## Usage
395
+
396
+ ### Direct Usage (Sentence Transformers)
397
+
398
+ First install the Sentence Transformers library:
399
+
400
+ ```bash
401
+ pip install -U sentence-transformers
402
+ ```
403
+
404
+ Then you can load this model and run inference.
405
+ ```python
406
+ from sentence_transformers import SentenceTransformer
407
+
408
+ # Download from the 🤗 Hub
409
+ model = SentenceTransformer("schawla2/e5-unsupervised-financial-matryoshka")
410
+ # Run inference
411
+ sentences = [
412
+ 'What was the total cost and expenses reported by Berkshire Hathaway for the year ended December 31, 2023?',
413
+ 'Total costs and expenses | | 321,144 | | | 266,484 | | | 243,752',
414
+ 'Qulipta (atogepant) is a calcitonin gene-related peptide receptor antagonist indicated for the preventive treatment of episodic and chronic migraine in adults. Qulipta is commercialized in the United States and Canada and is approved in the European Union under the brand name Aquipta.',
415
+ ]
416
+ embeddings = model.encode(sentences)
417
+ print(embeddings.shape)
418
+ # [3, 1024]
419
+
420
+ # Get the similarity scores for the embeddings
421
+ similarities = model.similarity(embeddings, embeddings)
422
+ print(similarities.shape)
423
+ # [3, 3]
424
+ ```
425
+
426
+ <!--
427
+ ### Direct Usage (Transformers)
428
+
429
+ <details><summary>Click to see the direct usage in Transformers</summary>
430
+
431
+ </details>
432
+ -->
433
+
434
+ <!--
435
+ ### Downstream Usage (Sentence Transformers)
436
+
437
+ You can finetune this model on your own dataset.
438
+
439
+ <details><summary>Click to expand</summary>
440
+
441
+ </details>
442
+ -->
443
+
444
+ <!--
445
+ ### Out-of-Scope Use
446
+
447
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
448
+ -->
449
+
450
+ ## Evaluation
451
+
452
+ ### Metrics
453
+
454
+ #### Information Retrieval
455
+
456
+ * Datasets: `dim_768`, `dim_512`, `dim_256`, `dim_128` and `dim_64`
457
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
458
+
459
+ | Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
460
+ |:--------------------|:-----------|:-----------|:----------|:-----------|:-----------|
461
+ | cosine_accuracy@1 | 0.7271 | 0.7271 | 0.72 | 0.7086 | 0.6757 |
462
+ | cosine_accuracy@3 | 0.85 | 0.8457 | 0.8457 | 0.8286 | 0.8186 |
463
+ | cosine_accuracy@5 | 0.8786 | 0.88 | 0.8714 | 0.8729 | 0.8457 |
464
+ | cosine_accuracy@10 | 0.9114 | 0.9129 | 0.9057 | 0.8986 | 0.8843 |
465
+ | cosine_precision@1 | 0.7271 | 0.7271 | 0.72 | 0.7086 | 0.6757 |
466
+ | cosine_precision@3 | 0.2833 | 0.2819 | 0.2819 | 0.2762 | 0.2729 |
467
+ | cosine_precision@5 | 0.1757 | 0.176 | 0.1743 | 0.1746 | 0.1691 |
468
+ | cosine_precision@10 | 0.0911 | 0.0913 | 0.0906 | 0.0899 | 0.0884 |
469
+ | cosine_recall@1 | 0.7271 | 0.7271 | 0.72 | 0.7086 | 0.6757 |
470
+ | cosine_recall@3 | 0.85 | 0.8457 | 0.8457 | 0.8286 | 0.8186 |
471
+ | cosine_recall@5 | 0.8786 | 0.88 | 0.8714 | 0.8729 | 0.8457 |
472
+ | cosine_recall@10 | 0.9114 | 0.9129 | 0.9057 | 0.8986 | 0.8843 |
473
+ | **cosine_ndcg@10** | **0.8225** | **0.8224** | **0.816** | **0.8074** | **0.7862** |
474
+ | cosine_mrr@10 | 0.7937 | 0.7931 | 0.7869 | 0.7777 | 0.7543 |
475
+ | cosine_map@100 | 0.7974 | 0.7967 | 0.7907 | 0.7816 | 0.7588 |
476
+
477
+ <!--
478
+ ## Bias, Risks and Limitations
479
+
480
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
481
+ -->
482
+
483
+ <!--
484
+ ### Recommendations
485
+
486
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
487
+ -->
488
+
489
+ ## Training Details
490
+
491
+ ### Training Dataset
492
+
493
+ #### json
494
+
495
+ * Dataset: json
496
+ * Size: 6,300 training samples
497
+ * Columns: <code>anchor</code> and <code>positive</code>
498
+ * Approximate statistics based on the first 1000 samples:
499
+ | | anchor | positive |
500
+ |:--------|:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
501
+ | type | string | string |
502
+ | details | <ul><li>min: 8 tokens</li><li>mean: 20.8 tokens</li><li>max: 51 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 45.24 tokens</li><li>max: 326 tokens</li></ul> |
503
+ * Samples:
504
+ | anchor | positive |
505
+ |:--------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
506
+ | <code>How many full-time employees did Microsoft report as of June 30, 2023?</code> | <code>As of June 30, 2023, we employed approximately 221,000 people on a full-time basis, 120,000 in the U.S. and 101,000 internationally.</code> |
507
+ | <code>What was the total amount CSC paid for Series G preferred stock repurchases in 2023?</code> | <code>In 2023, CSC repurchased 42,036 depositary shares representing interests in Series G preferred stock for a total amount of $42 million.</code> |
508
+ | <code>What does Note 13 in the Annual Report on Form 10-K discuss?</code> | <code>For a discussion of legal and other proceedings in which we are involved, see Note 13 - Commitments and Contingencies in the Notes to Consolidated Financial Statements.</code> |
509
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
510
+ ```json
511
+ {
512
+ "loss": "MultipleNegativesRankingLoss",
513
+ "matryoshka_dims": [
514
+ 768,
515
+ 512,
516
+ 256,
517
+ 128,
518
+ 64
519
+ ],
520
+ "matryoshka_weights": [
521
+ 1,
522
+ 1,
523
+ 1,
524
+ 1,
525
+ 1
526
+ ],
527
+ "n_dims_per_step": -1
528
+ }
529
+ ```
530
+
531
+ ### Training Hyperparameters
532
+ #### Non-Default Hyperparameters
533
+
534
+ - `eval_strategy`: epoch
535
+ - `per_device_eval_batch_size`: 16
536
+ - `gradient_accumulation_steps`: 16
537
+ - `learning_rate`: 2e-05
538
+ - `num_train_epochs`: 4
539
+ - `lr_scheduler_type`: cosine
540
+ - `warmup_ratio`: 0.1
541
+ - `bf16`: True
542
+ - `tf32`: True
543
+ - `load_best_model_at_end`: True
544
+ - `optim`: adamw_torch_fused
545
+ - `batch_sampler`: no_duplicates
546
+
547
+ #### All Hyperparameters
548
+ <details><summary>Click to expand</summary>
549
+
550
+ - `overwrite_output_dir`: False
551
+ - `do_predict`: False
552
+ - `eval_strategy`: epoch
553
+ - `prediction_loss_only`: True
554
+ - `per_device_train_batch_size`: 8
555
+ - `per_device_eval_batch_size`: 16
556
+ - `per_gpu_train_batch_size`: None
557
+ - `per_gpu_eval_batch_size`: None
558
+ - `gradient_accumulation_steps`: 16
559
+ - `eval_accumulation_steps`: None
560
+ - `torch_empty_cache_steps`: None
561
+ - `learning_rate`: 2e-05
562
+ - `weight_decay`: 0.0
563
+ - `adam_beta1`: 0.9
564
+ - `adam_beta2`: 0.999
565
+ - `adam_epsilon`: 1e-08
566
+ - `max_grad_norm`: 1.0
567
+ - `num_train_epochs`: 4
568
+ - `max_steps`: -1
569
+ - `lr_scheduler_type`: cosine
570
+ - `lr_scheduler_kwargs`: {}
571
+ - `warmup_ratio`: 0.1
572
+ - `warmup_steps`: 0
573
+ - `log_level`: passive
574
+ - `log_level_replica`: warning
575
+ - `log_on_each_node`: True
576
+ - `logging_nan_inf_filter`: True
577
+ - `save_safetensors`: True
578
+ - `save_on_each_node`: False
579
+ - `save_only_model`: False
580
+ - `restore_callback_states_from_checkpoint`: False
581
+ - `no_cuda`: False
582
+ - `use_cpu`: False
583
+ - `use_mps_device`: False
584
+ - `seed`: 42
585
+ - `data_seed`: None
586
+ - `jit_mode_eval`: False
587
+ - `use_ipex`: False
588
+ - `bf16`: True
589
+ - `fp16`: False
590
+ - `fp16_opt_level`: O1
591
+ - `half_precision_backend`: auto
592
+ - `bf16_full_eval`: False
593
+ - `fp16_full_eval`: False
594
+ - `tf32`: True
595
+ - `local_rank`: 0
596
+ - `ddp_backend`: None
597
+ - `tpu_num_cores`: None
598
+ - `tpu_metrics_debug`: False
599
+ - `debug`: []
600
+ - `dataloader_drop_last`: False
601
+ - `dataloader_num_workers`: 0
602
+ - `dataloader_prefetch_factor`: None
603
+ - `past_index`: -1
604
+ - `disable_tqdm`: False
605
+ - `remove_unused_columns`: True
606
+ - `label_names`: None
607
+ - `load_best_model_at_end`: True
608
+ - `ignore_data_skip`: False
609
+ - `fsdp`: []
610
+ - `fsdp_min_num_params`: 0
611
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
612
+ - `fsdp_transformer_layer_cls_to_wrap`: None
613
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
614
+ - `deepspeed`: None
615
+ - `label_smoothing_factor`: 0.0
616
+ - `optim`: adamw_torch_fused
617
+ - `optim_args`: None
618
+ - `adafactor`: False
619
+ - `group_by_length`: False
620
+ - `length_column_name`: length
621
+ - `ddp_find_unused_parameters`: None
622
+ - `ddp_bucket_cap_mb`: None
623
+ - `ddp_broadcast_buffers`: False
624
+ - `dataloader_pin_memory`: True
625
+ - `dataloader_persistent_workers`: False
626
+ - `skip_memory_metrics`: True
627
+ - `use_legacy_prediction_loop`: False
628
+ - `push_to_hub`: False
629
+ - `resume_from_checkpoint`: None
630
+ - `hub_model_id`: None
631
+ - `hub_strategy`: every_save
632
+ - `hub_private_repo`: None
633
+ - `hub_always_push`: False
634
+ - `gradient_checkpointing`: False
635
+ - `gradient_checkpointing_kwargs`: None
636
+ - `include_inputs_for_metrics`: False
637
+ - `include_for_metrics`: []
638
+ - `eval_do_concat_batches`: True
639
+ - `fp16_backend`: auto
640
+ - `push_to_hub_model_id`: None
641
+ - `push_to_hub_organization`: None
642
+ - `mp_parameters`:
643
+ - `auto_find_batch_size`: False
644
+ - `full_determinism`: False
645
+ - `torchdynamo`: None
646
+ - `ray_scope`: last
647
+ - `ddp_timeout`: 1800
648
+ - `torch_compile`: False
649
+ - `torch_compile_backend`: None
650
+ - `torch_compile_mode`: None
651
+ - `dispatch_batches`: None
652
+ - `split_batches`: None
653
+ - `include_tokens_per_second`: False
654
+ - `include_num_input_tokens_seen`: False
655
+ - `neftune_noise_alpha`: None
656
+ - `optim_target_modules`: None
657
+ - `batch_eval_metrics`: False
658
+ - `eval_on_start`: False
659
+ - `use_liger_kernel`: False
660
+ - `eval_use_gather_object`: False
661
+ - `average_tokens_across_devices`: False
662
+ - `prompts`: None
663
+ - `batch_sampler`: no_duplicates
664
+ - `multi_dataset_batch_sampler`: proportional
665
+
666
+ </details>
667
+
668
+ ### Training Logs
669
+ | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
670
+ |:---------:|:-------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
671
+ | 0.2030 | 10 | 9.3166 | - | - | - | - | - |
672
+ | 0.4061 | 20 | 3.7163 | - | - | - | - | - |
673
+ | 0.6091 | 30 | 2.8216 | - | - | - | - | - |
674
+ | 0.8122 | 40 | 1.9313 | - | - | - | - | - |
675
+ | 1.0 | 50 | 1.5613 | 0.8230 | 0.8237 | 0.8153 | 0.8036 | 0.7771 |
676
+ | 1.2030 | 60 | 1.0926 | - | - | - | - | - |
677
+ | 1.4061 | 70 | 0.3367 | - | - | - | - | - |
678
+ | 1.6091 | 80 | 0.3958 | - | - | - | - | - |
679
+ | 1.8122 | 90 | 0.6527 | - | - | - | - | - |
680
+ | 2.0 | 100 | 0.4483 | 0.8202 | 0.8209 | 0.8118 | 0.8033 | 0.7792 |
681
+ | 2.2030 | 110 | 0.1823 | - | - | - | - | - |
682
+ | 2.4061 | 120 | 0.0494 | - | - | - | - | - |
683
+ | 2.6091 | 130 | 0.1204 | - | - | - | - | - |
684
+ | 2.8122 | 140 | 0.2021 | - | - | - | - | - |
685
+ | 3.0 | 150 | 0.2088 | 0.8211 | 0.8213 | 0.8148 | 0.8064 | 0.7825 |
686
+ | 3.2030 | 160 | 0.062 | - | - | - | - | - |
687
+ | 3.4061 | 170 | 0.022 | - | - | - | - | - |
688
+ | 3.6091 | 180 | 0.0654 | - | - | - | - | - |
689
+ | 3.8122 | 190 | 0.1481 | - | - | - | - | - |
690
+ | **3.934** | **196** | **-** | **0.8225** | **0.8224** | **0.816** | **0.8074** | **0.7862** |
691
+
692
+ * The bold row denotes the saved checkpoint.
693
+
694
+ ### Framework Versions
695
+ - Python: 3.10.16
696
+ - Sentence Transformers: 3.3.1
697
+ - Transformers: 4.48.1
698
+ - PyTorch: 2.5.1+cu124
699
+ - Accelerate: 1.3.0
700
+ - Datasets: 3.3.2
701
+ - Tokenizers: 0.21.0
702
+
703
+ ## Citation
704
+
705
+ ### BibTeX
706
+
707
+ #### Sentence Transformers
708
+ ```bibtex
709
+ @inproceedings{reimers-2019-sentence-bert,
710
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
711
+ author = "Reimers, Nils and Gurevych, Iryna",
712
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
713
+ month = "11",
714
+ year = "2019",
715
+ publisher = "Association for Computational Linguistics",
716
+ url = "https://arxiv.org/abs/1908.10084",
717
+ }
718
+ ```
719
+
720
+ #### MatryoshkaLoss
721
+ ```bibtex
722
+ @misc{kusupati2024matryoshka,
723
+ title={Matryoshka Representation Learning},
724
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
725
+ year={2024},
726
+ eprint={2205.13147},
727
+ archivePrefix={arXiv},
728
+ primaryClass={cs.LG}
729
+ }
730
+ ```
731
+
732
+ #### MultipleNegativesRankingLoss
733
+ ```bibtex
734
+ @misc{henderson2017efficient,
735
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
736
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
737
+ year={2017},
738
+ eprint={1705.00652},
739
+ archivePrefix={arXiv},
740
+ primaryClass={cs.CL}
741
+ }
742
+ ```
743
+
744
+ <!--
745
+ ## Glossary
746
+
747
+ *Clearly define terms in order to be accessible across audiences.*
748
+ -->
749
+
750
+ <!--
751
+ ## Model Card Authors
752
+
753
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
754
+ -->
755
+
756
+ <!--
757
+ ## Model Card Contact
758
+
759
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
760
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "intfloat/e5-large-unsupervised",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 1024,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 4096,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 16,
17
+ "num_hidden_layers": 24,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.48.1",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.48.1",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06fc06f3de914b8196855266ddb18ececade008ce0c048ce0e87c990034f2ea3
3
+ size 1340612432
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "BertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff