lizchu414 commited on
Commit
1cf4d0f
·
verified ·
1 Parent(s): 44f558d

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,728 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: sentence-transformers/all-mpnet-base-v2
3
+ language:
4
+ - en
5
+ library_name: sentence-transformers
6
+ license: apache-2.0
7
+ metrics:
8
+ - cosine_accuracy@1
9
+ - cosine_accuracy@3
10
+ - cosine_accuracy@5
11
+ - cosine_accuracy@10
12
+ - cosine_precision@1
13
+ - cosine_precision@3
14
+ - cosine_precision@5
15
+ - cosine_precision@10
16
+ - cosine_recall@1
17
+ - cosine_recall@3
18
+ - cosine_recall@5
19
+ - cosine_recall@10
20
+ - cosine_ndcg@10
21
+ - cosine_mrr@10
22
+ - cosine_map@100
23
+ - dot_accuracy@1
24
+ - dot_accuracy@3
25
+ - dot_accuracy@5
26
+ - dot_accuracy@10
27
+ - dot_precision@1
28
+ - dot_precision@3
29
+ - dot_precision@5
30
+ - dot_precision@10
31
+ - dot_recall@1
32
+ - dot_recall@3
33
+ - dot_recall@5
34
+ - dot_recall@10
35
+ - dot_ndcg@10
36
+ - dot_mrr@10
37
+ - dot_map@100
38
+ pipeline_tag: sentence-similarity
39
+ tags:
40
+ - sentence-transformers
41
+ - sentence-similarity
42
+ - feature-extraction
43
+ - generated_from_trainer
44
+ - dataset_size:5166
45
+ - loss:MultipleNegativesRankingLoss
46
+ widget:
47
+ - source_sentence: 'Question: Who is the dungeon master in the Knights of the Arcade
48
+ comedy show, and how are the destinations and battles decided during the performance?'
49
+ sentences:
50
+ - 'Event Name: Knights of the Arcade: Epic D&D Adventure
51
+
52
+ Categories: Entertainment, Nightlife
53
+
54
+ Dates: Jun 29, 2024 - Jun 29, 2024 | 9:00 pm - 10:30 pm
55
+
56
+ Location: Arcade Comedy Theater, 943 Liberty Ave, Pittsburgh, PA 15222
57
+
58
+ Description: “Best Nerd Fantasy Come to Life” by Pittsburgh Magazine“A neo-geek
59
+ wet dream” – Pittsburgh City PaperA comedy quest awaits! Knights of the Arcade
60
+ is an award-winning comedy show that takes audiences on a wild, madcap adventure
61
+ every month. A recurring cast of characters (a dwarf, a monk, a rogue, a sorcerer
62
+ and a fighter) are joined by special guests and led by their maniacal dungeon
63
+ master. Where they’re going, who they fight, and if they ultimately succeed is
64
+ decided upon live dice that are rolled and projected on the theater wall.'
65
+ - The Pirates are also often referred to as the "Bucs" or the "Buccos" (derived
66
+ from buccaneer, a synonym for pirate). Since 2001 the team has played its home
67
+ games at PNC Park, a 39,000-seat stadium along the Allegheny River in Pittsburgh's
68
+ North Side. The Pirates previously played at Forbes Field from 1909 to 1970 and
69
+ at Three Rivers Stadium from 1970 to 2000. Since 1948 the Pirates' colors have
70
+ been black, gold and white, derived from the flag of Pittsburgh and matching the
71
+ other major professional sports teams in Pittsburgh, the Steelers and the Penguins.The
72
+ Pittsburgh Pirates are an American professional baseball team based in Pittsburgh.
73
+ The Pirates compete in Major League Baseball (MLB) as a member club of the National
74
+ League (NL) Central Division. Founded as part of the American Association in 1881
75
+ under the name Pittsburgh Alleghenys, the club joined the National League in 1887
76
+ and was a member of the National League East from 1969 through 1993. The Pirates
77
+ have won five World
78
+ - "STEELERS IN THE POSTSEASON (36-30)\nYear Record Game Date Opponent Attendance\
79
+ \ Steelers Opponent Result\n2015 10-6 AFC Wild Card Game 01/09/2016 at Cincinnati\
80
+ \ 63,257 18 16 W\nAFC Divisional Playoff 01/17/2016 at Denver 79,956 16 23 L\n\
81
+ 2016# 11-5 AFC Wild Card Game 01/08/2017 Miami 66,726 30 12 W\nAFC Divisional\
82
+ \ Playoff 01/15/2017 at Kansas City 75,678 18 16 W\nAFC Championship Game 01/22/2017\
83
+ \ at New England 66,829 36 17 L\n2017# 13-3 AFC Divisional Playoff 01/14/2018\
84
+ \ Jacksonville 64,524 42 45 L\n2020# 12-4 AFC Wild Card Game 01/03/2021 Cleveland\
85
+ \ - 37 48 L\n2021 9-7-1 AFC Wild Card Game 01/16/2022 at Kansas City 73,253 21\
86
+ \ 42 L\n2023 10-7 AFC Wild Card Game 01/15/202 4 at Buffalo 70,040 17 31 L\n*AFC\
87
+ \ Central Champion\n#AFC North Champion\n+AFC ChampionSTEELERS IN THE POSTSEASON\n\
88
+ \ 2023 PITTSBURGH STEELERS\n 421\n STEELERS IN THE POSTSEASON"
89
+ - source_sentence: 'Question: What is the Local Services Tax and how is it collected?'
90
+ sentences:
91
+ - the 1916 Centennial of Pittsburgh's 1816 incorporation as a City. At the March
92
+ 1916 dedication ceremony, Mayor Joseph Armstrong placed a time capsule into the
93
+ still under construction building. Two and a half years later
94
+ in December 1917, he would become the first Mayor to call the City-County Building
95
+ a second home. The missing time capsule has yet to be discovered.
96
+ - 'The first City Hall at Market Square.
97
+
98
+ The second City Hall on Smithfield Street.
99
+
100
+ Mayor David Lawrence strikes the first blow for the demolition of the second City
101
+ Hall.'
102
+ - "EXEMPT P ERSON – a person who files an exemption certificate with his employer\
103
+ \ affirming \nthat he reasonably expects to receive earned income and net profits\
104
+ \ from all sources within the \nCity of less than twelve thousand dollars ($12,000)\
105
+ \ in the calendar year for wh ich the exemption \ncertificate is filed. See Section\
106
+ \ 301(h) below, and Section 2 of the Local Tax Enabling Act, 53 P.S. § \n6924.301.1,\
107
+ \ for other exemptions. \nINCOME – all earned income and net profits from whatever\
108
+ \ source derived, including but not \nlimited to salaries, wages, bonuses, commissions\
109
+ \ and income from self -employment earned in \nPittsburgh. \nLOCAL SERVICES TAX\
110
+ \ (LST) – a tax on individuals for the privilege of engaging in an \noccupation.\
111
+ \ The Local Services Tax may be levied, assessed and collected by the political\
112
+ \ \nsubdivision of the taxpayer’s primary place of employment. \nOCCUPATION –\
113
+ \ any livelihood, job, trade, profession, business or enterprise of any kind for"
114
+ - source_sentence: '"What is the nature of the incident being investigated by Zone
115
+ Five Officers in Homewood on April 23, 2024?"'
116
+ sentences:
117
+ - 'Event Name: Saturday Night Improv @ BGC!
118
+
119
+ Date: Saturdays, 7:30-9:30 p.m.
120
+
121
+ Location: BGC Community Activity Center: 113 N. Pacific Ave., Pittsburgh | Garfield
122
+
123
+ Price Information: GET TICKETS: 10
124
+
125
+ Categories: Comedy, Theater
126
+
127
+ Description: It''s time to Love, Laugh and Enjoy. Join us at the BGC Activity
128
+ Center Saturday evenings for an evening of improv with performances by Narsh and
129
+ Penny Pressed! Shows start promptly at 7:30 PM so don''t be late! 412-441-6950
130
+
131
+
132
+ Event Name: Swing City
133
+
134
+ Date: Saturdays, 8 p.m.
135
+
136
+ Location: Wightman School: 5604 Solway, Pittsburgh | Squirrel Hill
137
+
138
+ Categories: Other Stuff
139
+
140
+ Description: Learn & practice swing dancing skills w/ the Jim Adler Band. 412-759-1569'
141
+ - 'Police Investigate Stabbing Incident in Beltzhoover - 04.23.2024
142
+
143
+ Zone Five Officers Investigate Homewood Shooting Incident - 04.23.2024
144
+
145
+ Violent Crimes Division VCU Detectives Make Firearms Arrest in Spring Garden -
146
+ 04.19.2024
147
+
148
+ UPDATE: Detectives Seek Assistance in Search for Missing 12-Year-Old Girl - 04.19.2024
149
+
150
+ UPDATE: Police Investigate Aggravated Assault on Riverwalk in Point State Park
151
+ - 04.19.2024
152
+
153
+ Police Investigate Homicide Inside Larimer Residence - 04.19.2024
154
+
155
+ UPDATE: Police Seek the Public''s Help in Locating Missing Juvenile Male - 04.19.2024
156
+
157
+ UPDATE: Pittsburgh Police Ask for Public''s Help to Find Missing Woman - 04.15.2024
158
+
159
+ Police Investigate Shooting Incident in Allegheny Center - 04.13.2024
160
+
161
+ UPDATE: Pittsburgh Public Safety Responds to Barge Emergency on Ohio River - 04.12.2024
162
+
163
+ Police Make Ethnic Intimidation and Criminal Mischief Arrest in Squirrel Hill -
164
+ 04.12.2024
165
+
166
+ UPDATE: Police Seek the Public''s Assistance in Locating Missing Boy - 04.11.2024'
167
+ - "24\n \n$ (Millions)Select Major Expenditures, 2018-2022\n2018 2019 2020\n2021\
168
+ \ 2022Health Insurance\nWorkers' CompensationPension and OPEBDebt Service050,000,000100,000,000150,000,000\n\
169
+ Health Insurance\nThese expenditures are categorized within the Personnel – Employment\
170
+ \ Benefits subclass. Prior to 2016 these \nexpenditures were budgeted centrally\
171
+ \ in the Department of Human Resources and Civil Service. Except for retiree \n\
172
+ health insurance, these expenditures are budgeted across all divisions based on\
173
+ \ staffing levels and plan \nelections.\n Health Insurance\n52101 Health Insurance\n\
174
+ 52111 Other Insurance and Benefits\n52121 Retiree Health Insurance\nWorkers’\
175
+ \ Compensation\nThese expenditures are categorized within the Personnel – Employment\
176
+ \ Benefits subclass. Most medical, \nindemnity, and fees are budgeted across divisions\
177
+ \ with outstanding claims. Legal and settlement expenses \nremain budgeted in\
178
+ \ the Department of Human Resources and Civil Service with accounts organized\
179
+ \ as follows:"
180
+ - source_sentence: 'Answer: The passage does not provide information about the longest
181
+ reception for the Steelers in the Wild Card Game against Cincinnati.'
182
+ sentences:
183
+ - '09/08 Lions RESERVE/LEAGUE SUSP. T 27-27 +
184
+
185
+ 09/15 at Ravens RESERVE/LEAGUE SUSP. L 17-23
186
+
187
+ 09/22 Panthers RESERVE/LEAGUE SUSP. L 20-38
188
+
189
+ 09/29 Seahawks RESERVE/LEAGUE SUSP. L 10-27
190
+
191
+ 10/06 at Bengals RESERVE/LEAGUE SUSP. W 26-23
192
+
193
+ 10/13 Falcons RESERVE/LEAGUE SUSP. W 34-33
194
+
195
+ 10/20 at Giants S 7701.0 13.0 0 0 1 0 0 0 0 0 1 0 0 0 0000 000 W 27-21
196
+
197
+ 10/27 at Saints S 6510.0 0.0 0 0 0 1 0 0 0 1 0 0 0 0 0000 000 L 9-31
198
+
199
+ 10/31 49ers S 3210.0 0.0 0 0 0 0 0 0 0 0 0 0 0 0 0000 000 L 25-28
200
+
201
+ 11/10 at Buccaneers S 3300.0 0.0 0 0 0 0 0 0 0 0 0 0 0 0 0000 000 L 27-30
202
+
203
+ 11/17 at 49ers S 4400.0 0.0 0 0 0 0 0 0 0 1 0 0 0 0 0000 000 L 26-36
204
+
205
+ 12/01 Rams S 8530.0 0.0 1 10 0 0 0 0 0 0 0 0 0 0 0000 000 L 7-34
206
+
207
+ 12/08 Steelers S 5410.0 0.0 0 0 0 0 0 0 0 0 0 0 0 0 0000 000 L 17-23
208
+
209
+ 12/15 Browns S 7700.0 0.0 0 0 0 1 0 0 0 3 0 0 0 0 0000 000 W 38-24
210
+
211
+ 12/22 at Seahawks S 3300.0 0.0 1 18 0 0 0 0 0 0 0 0 0 0 0000 000 W 27-13
212
+
213
+ 12/29 at Rams S 7610.0 0.0 1 1 0 0 0 0 0 2 0 0 0 0 0000 000 L 24-31'
214
+ - "Program \n• Clinical field education to emergency medicine physician residents\
215
+ \ in the University of Pittsburgh \nEmergency Medicine Residency program \n \n\
216
+ 2023 Accomplishments\n \n• Financial Accomplishments:\n◦ Income from transports\
217
+ \ increased by $1.8M from same time period last year\n◦ Bureau slated to bring\
218
+ \ in an additional $5M in revenue for 2023\n• Personnel Accomplishments:\n◦ 6\
219
+ \ new River Rescue Divers went through intensive training and all successfully\
220
+ \ completed the \nclass\n◦ Increase in promotions to upper administration\n• Employee\
221
+ \ Safety Initiatives: \n◦ Implementation of Cordico App for employee wellness\n\
222
+ ◦ Access control security system installed in all EMS facilities \n• Equipment\
223
+ \ Initiatives:\n◦ Bureau was approved to receive state of the art mannequins to\
224
+ \ simulate real life patients during \nemergencies\n◦ Billing company to purchase\
225
+ \ equipment/medication dispensary machines to be located in 5 areas"
226
+ - "Pittsburgh 31\nCincinnati 17\nCINCINNATI — Pittsburgh scored 24 unanswered points\
227
+ \ to turn a 17-7 deficit into a \n31-17 victory over Cincinnati in the AFC Wild\
228
+ \ Card Game at Paul Brown Stadium. \nThe Pittsburgh offense compiled 346 total\
229
+ \ yards led by QB Ben Roethlisberger, who \ntossed three touchdowns and finished\
230
+ \ with a QB rating of 148.7. RB Jerome Bettis ran for 52 \nyards on 10 carries\
231
+ \ (5.2 avg.) and one touchdown. WR Cedrick Wilson caught three passes \nfor 104\
232
+ \ yards (34.7 avg.), with one touchdown. \nThe Steelers defense recorded four\
233
+ \ sacks and two interceptions while holding the \nBengals to just 84 yards rushing.\
234
+ \ \nCincinnati was dealt an early blow when starting QB Carson Palmer suffered\
235
+ \ a torn \nACL on the first offensive play of the game. The Bengals jumped out\
236
+ \ to a 10-0 lead with a \n23-yard field goal by K Shayne Graham and a 20-yard\
237
+ \ touchdown run by RB Rudi Johnson.\nPittsburgh got on the board when RB Willie\
238
+ \ Parker took a screen pass 19 yards for a"
239
+ - source_sentence: '"What cultural celebration will be honored at the 2024 Greater
240
+ Pittsburgh Lunar New Year Gala, and what is the significance of this event in
241
+ the community?"'
242
+ sentences:
243
+ - 'This page informs City of Pittsburgh residents about the city''s Snow Angels
244
+ program. This page is also where volunteers can sign up, and recipients can submit
245
+ a request.
246
+
247
+ City Collection Equity Audit
248
+
249
+ The City of Pittsburgh is conducting an audit to identify inequity and bias in
250
+ the City’s collection of public art and memorials.
251
+
252
+ Davis Avenue Bridge
253
+
254
+ Design and construction for the new Davis Avenue Bridge between Brighton Heights
255
+ and Riverview Park.
256
+
257
+ South Side Park Public Art
258
+
259
+ A new public art project is being planned in South Side Park. This is being done
260
+ in coordination with the park’s Phase 1 renovations and funded by the Percent
261
+ For Art.
262
+
263
+ Projects that are no longer accepting feedback, but are now in the construction
264
+ or development phase.
265
+
266
+ PHAD Projects
267
+
268
+ Current Projects – find out about ongoing projects underway throughout the city
269
+ and learn how to apply for new projects each year.
270
+
271
+ Emerald View Phase I Trails & Trailheads'
272
+ - of Pittsburgh and greater southwestern Pennsylvania. Justin is employed within
273
+ the Cultural Resources practice of Michael Baker International. He is Director
274
+ Emeritus of Preservation Pittsburgh and a past president of the East Liberty Valley
275
+ Historical Society. Justin is a graduate of the University of Pittsburgh (B.A.
276
+ Architectural Studies, 2008) and Columbia University (M.S. Historic Preservation,
277
+ 2010).Todd Wilson, MBA, PE, is an award-winning transportation engineer, named
278
+ one of Pittsburgh Business Times’ 20 Engineers to Know in 2022. He has co-authored
279
+ two books on Pittsburgh’s bridges,Images of America Pittsburgh’s Bridges and Engineering
280
+ Pittsburgh a History of Roads, Rails, Canals, Bridges, and More.An engineering
281
+ graduate of Carnegie Mellon, Todd has extensive knowledge on bridges, having photographed
282
+ them in all 50 states and 25 countries, and he has presented at many conferences.
283
+ Check out his Pittsburgh bridge photography on Instagram @pghbridges.TOUR STARTS/ENDS:Gateway
284
+ - 'Event Name: 2024 Greater Pittsburgh Lunar New Year Gala
285
+
286
+ Categories: Arts + Culture, Community, Holidays, Nightlife
287
+
288
+ Dates: Feb 3, 2024 - Feb 3, 2024 | 4:00 pm - 9:00 pm
289
+
290
+ Location: PNC Theater, 350 Forbes Avenue, Pittsburgh, PA 15222'
291
+ model-index:
292
+ - name: MPNet base trained on synthetic Pittsburgh data
293
+ results:
294
+ - task:
295
+ type: information-retrieval
296
+ name: Information Retrieval
297
+ dataset:
298
+ name: pittsburgh
299
+ type: pittsburgh
300
+ metrics:
301
+ - type: cosine_accuracy@1
302
+ value: 0.7375145180023229
303
+ name: Cosine Accuracy@1
304
+ - type: cosine_accuracy@3
305
+ value: 0.9037940379403794
306
+ name: Cosine Accuracy@3
307
+ - type: cosine_accuracy@5
308
+ value: 0.9368950832365467
309
+ name: Cosine Accuracy@5
310
+ - type: cosine_accuracy@10
311
+ value: 0.9628339140534262
312
+ name: Cosine Accuracy@10
313
+ - type: cosine_precision@1
314
+ value: 0.7375145180023229
315
+ name: Cosine Precision@1
316
+ - type: cosine_precision@3
317
+ value: 0.30126467931345985
318
+ name: Cosine Precision@3
319
+ - type: cosine_precision@5
320
+ value: 0.1873790166473093
321
+ name: Cosine Precision@5
322
+ - type: cosine_precision@10
323
+ value: 0.09628339140534262
324
+ name: Cosine Precision@10
325
+ - type: cosine_recall@1
326
+ value: 0.7375145180023229
327
+ name: Cosine Recall@1
328
+ - type: cosine_recall@3
329
+ value: 0.9037940379403794
330
+ name: Cosine Recall@3
331
+ - type: cosine_recall@5
332
+ value: 0.9368950832365467
333
+ name: Cosine Recall@5
334
+ - type: cosine_recall@10
335
+ value: 0.9628339140534262
336
+ name: Cosine Recall@10
337
+ - type: cosine_ndcg@10
338
+ value: 0.8590408201907759
339
+ name: Cosine Ndcg@10
340
+ - type: cosine_mrr@10
341
+ value: 0.824762258110111
342
+ name: Cosine Mrr@10
343
+ - type: cosine_map@100
344
+ value: 0.8263189855192845
345
+ name: Cosine Map@100
346
+ - type: dot_accuracy@1
347
+ value: 0.7375145180023229
348
+ name: Dot Accuracy@1
349
+ - type: dot_accuracy@3
350
+ value: 0.9037940379403794
351
+ name: Dot Accuracy@3
352
+ - type: dot_accuracy@5
353
+ value: 0.9368950832365467
354
+ name: Dot Accuracy@5
355
+ - type: dot_accuracy@10
356
+ value: 0.9628339140534262
357
+ name: Dot Accuracy@10
358
+ - type: dot_precision@1
359
+ value: 0.7375145180023229
360
+ name: Dot Precision@1
361
+ - type: dot_precision@3
362
+ value: 0.30126467931345985
363
+ name: Dot Precision@3
364
+ - type: dot_precision@5
365
+ value: 0.1873790166473093
366
+ name: Dot Precision@5
367
+ - type: dot_precision@10
368
+ value: 0.09628339140534262
369
+ name: Dot Precision@10
370
+ - type: dot_recall@1
371
+ value: 0.7375145180023229
372
+ name: Dot Recall@1
373
+ - type: dot_recall@3
374
+ value: 0.9037940379403794
375
+ name: Dot Recall@3
376
+ - type: dot_recall@5
377
+ value: 0.9368950832365467
378
+ name: Dot Recall@5
379
+ - type: dot_recall@10
380
+ value: 0.9628339140534262
381
+ name: Dot Recall@10
382
+ - type: dot_ndcg@10
383
+ value: 0.8590408201907759
384
+ name: Dot Ndcg@10
385
+ - type: dot_mrr@10
386
+ value: 0.824762258110111
387
+ name: Dot Mrr@10
388
+ - type: dot_map@100
389
+ value: 0.8263189855192845
390
+ name: Dot Map@100
391
+ ---
392
+
393
+ # MPNet base trained on synthetic Pittsburgh data
394
+
395
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
396
+
397
+ ## Model Details
398
+
399
+ ### Model Description
400
+ - **Model Type:** Sentence Transformer
401
+ - **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) <!-- at revision f1b1b820e405bb8644f5e8d9a3b98f9c9e0a3c58 -->
402
+ - **Maximum Sequence Length:** 384 tokens
403
+ - **Output Dimensionality:** 768 tokens
404
+ - **Similarity Function:** Cosine Similarity
405
+ <!-- - **Training Dataset:** Unknown -->
406
+ - **Language:** en
407
+ - **License:** apache-2.0
408
+
409
+ ### Model Sources
410
+
411
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
412
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
413
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
414
+
415
+ ### Full Model Architecture
416
+
417
+ ```
418
+ SentenceTransformer(
419
+ (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
420
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
421
+ (2): Normalize()
422
+ )
423
+ ```
424
+
425
+ ## Usage
426
+
427
+ ### Direct Usage (Sentence Transformers)
428
+
429
+ First install the Sentence Transformers library:
430
+
431
+ ```bash
432
+ pip install -U sentence-transformers
433
+ ```
434
+
435
+ Then you can load this model and run inference.
436
+ ```python
437
+ from sentence_transformers import SentenceTransformer
438
+
439
+ # Download from the 🤗 Hub
440
+ model = SentenceTransformer("lizchu414/mpnet-base-all-pittsburgh-squad")
441
+ # Run inference
442
+ sentences = [
443
+ '"What cultural celebration will be honored at the 2024 Greater Pittsburgh Lunar New Year Gala, and what is the significance of this event in the community?"',
444
+ 'Event Name: 2024 Greater Pittsburgh Lunar New Year Gala\nCategories: Arts + Culture, Community, Holidays, Nightlife\nDates: Feb 3, 2024 - Feb 3, 2024 | 4:00 pm - 9:00 pm\nLocation: PNC Theater, 350 Forbes Avenue, Pittsburgh, PA 15222',
445
+ "This page informs City of Pittsburgh residents about the city's Snow Angels program. This page is also where volunteers can sign up, and recipients can submit a request.\nCity Collection Equity Audit\nThe City of Pittsburgh is conducting an audit to identify inequity and bias in the City’s collection of public art and memorials.\nDavis Avenue Bridge\nDesign and construction for the new Davis Avenue Bridge between Brighton Heights and Riverview Park.\nSouth Side Park Public Art\nA new public art project is being planned in South Side Park. This is being done in coordination with the park’s Phase 1 renovations and funded by the Percent For Art.\nProjects that are no longer accepting feedback, but are now in the construction or development phase.\nPHAD Projects\nCurrent Projects – find out about ongoing projects underway throughout the city and learn how to apply for new projects each year.\nEmerald View Phase I Trails & Trailheads",
446
+ ]
447
+ embeddings = model.encode(sentences)
448
+ print(embeddings.shape)
449
+ # [3, 768]
450
+
451
+ # Get the similarity scores for the embeddings
452
+ similarities = model.similarity(embeddings, embeddings)
453
+ print(similarities.shape)
454
+ # [3, 3]
455
+ ```
456
+
457
+ <!--
458
+ ### Direct Usage (Transformers)
459
+
460
+ <details><summary>Click to see the direct usage in Transformers</summary>
461
+
462
+ </details>
463
+ -->
464
+
465
+ <!--
466
+ ### Downstream Usage (Sentence Transformers)
467
+
468
+ You can finetune this model on your own dataset.
469
+
470
+ <details><summary>Click to expand</summary>
471
+
472
+ </details>
473
+ -->
474
+
475
+ <!--
476
+ ### Out-of-Scope Use
477
+
478
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
479
+ -->
480
+
481
+ ## Evaluation
482
+
483
+ ### Metrics
484
+
485
+ #### Information Retrieval
486
+ * Dataset: `pittsburgh`
487
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
488
+
489
+ | Metric | Value |
490
+ |:--------------------|:-----------|
491
+ | cosine_accuracy@1 | 0.7375 |
492
+ | cosine_accuracy@3 | 0.9038 |
493
+ | cosine_accuracy@5 | 0.9369 |
494
+ | cosine_accuracy@10 | 0.9628 |
495
+ | cosine_precision@1 | 0.7375 |
496
+ | cosine_precision@3 | 0.3013 |
497
+ | cosine_precision@5 | 0.1874 |
498
+ | cosine_precision@10 | 0.0963 |
499
+ | cosine_recall@1 | 0.7375 |
500
+ | cosine_recall@3 | 0.9038 |
501
+ | cosine_recall@5 | 0.9369 |
502
+ | cosine_recall@10 | 0.9628 |
503
+ | cosine_ndcg@10 | 0.859 |
504
+ | cosine_mrr@10 | 0.8248 |
505
+ | cosine_map@100 | 0.8263 |
506
+ | dot_accuracy@1 | 0.7375 |
507
+ | dot_accuracy@3 | 0.9038 |
508
+ | dot_accuracy@5 | 0.9369 |
509
+ | dot_accuracy@10 | 0.9628 |
510
+ | dot_precision@1 | 0.7375 |
511
+ | dot_precision@3 | 0.3013 |
512
+ | dot_precision@5 | 0.1874 |
513
+ | dot_precision@10 | 0.0963 |
514
+ | dot_recall@1 | 0.7375 |
515
+ | dot_recall@3 | 0.9038 |
516
+ | dot_recall@5 | 0.9369 |
517
+ | dot_recall@10 | 0.9628 |
518
+ | dot_ndcg@10 | 0.859 |
519
+ | dot_mrr@10 | 0.8248 |
520
+ | **dot_map@100** | **0.8263** |
521
+
522
+ <!--
523
+ ## Bias, Risks and Limitations
524
+
525
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
526
+ -->
527
+
528
+ <!--
529
+ ### Recommendations
530
+
531
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
532
+ -->
533
+
534
+ ## Training Details
535
+
536
+ ### Training Hyperparameters
537
+ #### Non-Default Hyperparameters
538
+
539
+ - `eval_strategy`: steps
540
+ - `per_device_eval_batch_size`: 2
541
+ - `eval_accumulation_steps`: 1
542
+ - `learning_rate`: 2e-05
543
+ - `warmup_ratio`: 0.1
544
+ - `fp16`: True
545
+ - `batch_sampler`: no_duplicates
546
+
547
+ #### All Hyperparameters
548
+ <details><summary>Click to expand</summary>
549
+
550
+ - `overwrite_output_dir`: False
551
+ - `do_predict`: False
552
+ - `eval_strategy`: steps
553
+ - `prediction_loss_only`: True
554
+ - `per_device_train_batch_size`: 8
555
+ - `per_device_eval_batch_size`: 2
556
+ - `per_gpu_train_batch_size`: None
557
+ - `per_gpu_eval_batch_size`: None
558
+ - `gradient_accumulation_steps`: 1
559
+ - `eval_accumulation_steps`: 1
560
+ - `torch_empty_cache_steps`: None
561
+ - `learning_rate`: 2e-05
562
+ - `weight_decay`: 0.0
563
+ - `adam_beta1`: 0.9
564
+ - `adam_beta2`: 0.999
565
+ - `adam_epsilon`: 1e-08
566
+ - `max_grad_norm`: 1.0
567
+ - `num_train_epochs`: 3
568
+ - `max_steps`: -1
569
+ - `lr_scheduler_type`: linear
570
+ - `lr_scheduler_kwargs`: {}
571
+ - `warmup_ratio`: 0.1
572
+ - `warmup_steps`: 0
573
+ - `log_level`: passive
574
+ - `log_level_replica`: warning
575
+ - `log_on_each_node`: True
576
+ - `logging_nan_inf_filter`: True
577
+ - `save_safetensors`: True
578
+ - `save_on_each_node`: False
579
+ - `save_only_model`: False
580
+ - `restore_callback_states_from_checkpoint`: False
581
+ - `no_cuda`: False
582
+ - `use_cpu`: False
583
+ - `use_mps_device`: False
584
+ - `seed`: 42
585
+ - `data_seed`: None
586
+ - `jit_mode_eval`: False
587
+ - `use_ipex`: False
588
+ - `bf16`: False
589
+ - `fp16`: True
590
+ - `fp16_opt_level`: O1
591
+ - `half_precision_backend`: auto
592
+ - `bf16_full_eval`: False
593
+ - `fp16_full_eval`: False
594
+ - `tf32`: None
595
+ - `local_rank`: 0
596
+ - `ddp_backend`: None
597
+ - `tpu_num_cores`: None
598
+ - `tpu_metrics_debug`: False
599
+ - `debug`: []
600
+ - `dataloader_drop_last`: False
601
+ - `dataloader_num_workers`: 0
602
+ - `dataloader_prefetch_factor`: None
603
+ - `past_index`: -1
604
+ - `disable_tqdm`: False
605
+ - `remove_unused_columns`: True
606
+ - `label_names`: None
607
+ - `load_best_model_at_end`: False
608
+ - `ignore_data_skip`: False
609
+ - `fsdp`: []
610
+ - `fsdp_min_num_params`: 0
611
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
612
+ - `fsdp_transformer_layer_cls_to_wrap`: None
613
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
614
+ - `deepspeed`: None
615
+ - `label_smoothing_factor`: 0.0
616
+ - `optim`: adamw_torch
617
+ - `optim_args`: None
618
+ - `adafactor`: False
619
+ - `group_by_length`: False
620
+ - `length_column_name`: length
621
+ - `ddp_find_unused_parameters`: None
622
+ - `ddp_bucket_cap_mb`: None
623
+ - `ddp_broadcast_buffers`: False
624
+ - `dataloader_pin_memory`: True
625
+ - `dataloader_persistent_workers`: False
626
+ - `skip_memory_metrics`: True
627
+ - `use_legacy_prediction_loop`: False
628
+ - `push_to_hub`: False
629
+ - `resume_from_checkpoint`: None
630
+ - `hub_model_id`: None
631
+ - `hub_strategy`: every_save
632
+ - `hub_private_repo`: False
633
+ - `hub_always_push`: False
634
+ - `gradient_checkpointing`: False
635
+ - `gradient_checkpointing_kwargs`: None
636
+ - `include_inputs_for_metrics`: False
637
+ - `eval_do_concat_batches`: True
638
+ - `fp16_backend`: auto
639
+ - `push_to_hub_model_id`: None
640
+ - `push_to_hub_organization`: None
641
+ - `mp_parameters`:
642
+ - `auto_find_batch_size`: False
643
+ - `full_determinism`: False
644
+ - `torchdynamo`: None
645
+ - `ray_scope`: last
646
+ - `ddp_timeout`: 1800
647
+ - `torch_compile`: False
648
+ - `torch_compile_backend`: None
649
+ - `torch_compile_mode`: None
650
+ - `dispatch_batches`: None
651
+ - `split_batches`: None
652
+ - `include_tokens_per_second`: False
653
+ - `include_num_input_tokens_seen`: False
654
+ - `neftune_noise_alpha`: None
655
+ - `optim_target_modules`: None
656
+ - `batch_eval_metrics`: False
657
+ - `eval_on_start`: False
658
+ - `use_liger_kernel`: False
659
+ - `eval_use_gather_object`: False
660
+ - `batch_sampler`: no_duplicates
661
+ - `multi_dataset_batch_sampler`: proportional
662
+
663
+ </details>
664
+
665
+ ### Training Logs
666
+ | Epoch | Step | Training Loss | Validation Loss | pittsburgh_dot_map@100 |
667
+ |:-----:|:----:|:-------------:|:---------------:|:----------------------:|
668
+ | 0 | 0 | - | - | 0.5984 |
669
+ | 0.8 | 100 | 0.587 | 0.1954 | 0.7780 |
670
+ | 1.592 | 200 | 0.1828 | 0.1805 | 0.8020 |
671
+ | 2.384 | 300 | 0.2224 | 0.1605 | 0.8263 |
672
+
673
+
674
+ ### Framework Versions
675
+ - Python: 3.12.7
676
+ - Sentence Transformers: 3.2.0
677
+ - Transformers: 4.45.2
678
+ - PyTorch: 2.2.2+cu121
679
+ - Accelerate: 1.0.1
680
+ - Datasets: 3.0.1
681
+ - Tokenizers: 0.20.1
682
+
683
+ ## Citation
684
+
685
+ ### BibTeX
686
+
687
+ #### Sentence Transformers
688
+ ```bibtex
689
+ @inproceedings{reimers-2019-sentence-bert,
690
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
691
+ author = "Reimers, Nils and Gurevych, Iryna",
692
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
693
+ month = "11",
694
+ year = "2019",
695
+ publisher = "Association for Computational Linguistics",
696
+ url = "https://arxiv.org/abs/1908.10084",
697
+ }
698
+ ```
699
+
700
+ #### MultipleNegativesRankingLoss
701
+ ```bibtex
702
+ @misc{henderson2017efficient,
703
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
704
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
705
+ year={2017},
706
+ eprint={1705.00652},
707
+ archivePrefix={arXiv},
708
+ primaryClass={cs.CL}
709
+ }
710
+ ```
711
+
712
+ <!--
713
+ ## Glossary
714
+
715
+ *Clearly define terms in order to be accessible across audiences.*
716
+ -->
717
+
718
+ <!--
719
+ ## Model Card Authors
720
+
721
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
722
+ -->
723
+
724
+ <!--
725
+ ## Model Card Contact
726
+
727
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
728
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/all-mpnet-base-v2",
3
+ "architectures": [
4
+ "MPNetModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 514,
16
+ "model_type": "mpnet",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "relative_attention_num_buckets": 32,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.45.2",
23
+ "vocab_size": 30527
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.2.0",
4
+ "transformers": "4.45.2",
5
+ "pytorch": "2.2.2+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3811aaf423a8ad4de3f64794d0844b94ca5d59b24a7de26b54303d41bba01f9f
3
+ size 437967672
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 384,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "104": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30526": {
44
+ "content": "<mask>",
45
+ "lstrip": true,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": false,
54
+ "cls_token": "<s>",
55
+ "do_lower_case": true,
56
+ "eos_token": "</s>",
57
+ "mask_token": "<mask>",
58
+ "max_length": 128,
59
+ "model_max_length": 384,
60
+ "pad_to_multiple_of": null,
61
+ "pad_token": "<pad>",
62
+ "pad_token_type_id": 0,
63
+ "padding_side": "right",
64
+ "sep_token": "</s>",
65
+ "stride": 0,
66
+ "strip_accents": null,
67
+ "tokenize_chinese_chars": true,
68
+ "tokenizer_class": "MPNetTokenizer",
69
+ "truncation_side": "right",
70
+ "truncation_strategy": "longest_first",
71
+ "unk_token": "[UNK]"
72
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff