File size: 26,481 Bytes
67a5a96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "807c463e-cd0f-4ffb-b974-b19a33a674bb",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Demo of the KAILAS UAT labeler capabilities\n",
    "This notebooks shows how to use KAILAS to automatically tag text with Unified Astronomy Thesaurus concepts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1fbf6cb4-1110-4883-9ffd-d39d81e4301a",
   "metadata": {},
   "source": [
    "## Preliminaries\n",
    "1. load UAT concepts\n",
    "2. load a dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9bc5eaae-4e37-425f-9a0e-b6a7e8eca3e0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# We need to build version of the UAT that is more suited to our needs\n",
    "# Download the original here: https://github.com/astrothesaurus/UAT/blob/master/UAT.json\n",
    "# and replace the path to it below\n",
    "# build the UAT labels dict\n",
    "import json\n",
    "with open('../data/UAT/UAT_list.json', 'r') as f:\n",
    "    uat_list = json.load(f)\n",
    "\n",
    "# build the dict that matches UAT ID (numbers) to common names\n",
    "uat_names = {}\n",
    "for entry in uat_list:\n",
    "    uat_id = entry['uri'].split('/')[-1]\n",
    "    uat_names[uat_id] = entry['name'].lower().strip()\n",
    "\n",
    "# sort by key\n",
    "uat_names = dict(sorted(uat_names.items()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "688bdebb-f711-49c7-8c1b-f9ede9529ce8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load the open dataset\n",
    "from datasets import load_dataset\n",
    "uat_dataset = load_dataset('adsabs/SciX_UAT_keywords')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "d4f2fd73-9cd1-44cb-bf7a-df3c91e19509",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DatasetDict({\n",
       "    val: Dataset({\n",
       "        features: ['bibcode', 'title', 'abstract', 'verified_uat_ids', 'verified_uat_labels'],\n",
       "        num_rows: 3025\n",
       "    })\n",
       "    train: Dataset({\n",
       "        features: ['bibcode', 'title', 'abstract', 'verified_uat_ids', 'verified_uat_labels'],\n",
       "        num_rows: 18677\n",
       "    })\n",
       "})"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Hugginface datasets can be interface both by rows (int) or columns (str)\n",
    "uat_dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2a9c73b-1d0b-4c8b-92e5-d964b8191003",
   "metadata": {},
   "source": [
    "## Main Demo\n",
    "1. create the prediction pipeline\n",
    "2. make your predictions\n",
    "3. format predictions for readability"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6f7e90b5-8aa0-4272-9c3c-ff365c95a0f9",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# create the pipeline\n",
    "\n",
    "from transformers import pipeline, AutoTokenizer\n",
    "\n",
    "model_path = 'adsabs/KAILAS'\n",
    "revision = None\n",
    "\n",
    "# sentiment-analysis means loading ModelForSequenceClassification\n",
    "pipe = pipeline(task='sentiment-analysis',\n",
    "                model=model_path,\n",
    "                tokenizer=AutoTokenizer.from_pretrained(model_path, \n",
    "                                                        model_max_length=512, \n",
    "                                                        do_lower_case=False,\n",
    "                                                       ),\n",
    "                revision=revision,\n",
    "                num_workers=1,\n",
    "                batch_size=32,\n",
    "                return_all_scores=True,\n",
    "                truncation=True,\n",
    "               )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "bf6071ce-692c-4870-a0c1-1d8249614227",
   "metadata": {},
   "outputs": [],
   "source": [
    "# custom top_k function \n",
    "import heapq\n",
    "\n",
    "def top_k_scores(scores, k):\n",
    "    return(heapq.nlargest(k, scores, key=lambda x: x['score']) )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "1ae70736-19fd-42c7-bd5f-7491e6a97cd8",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "BIBCODE: 2022ApJ...933..110X\n",
      "\tTITLE: Spatially Resolved Ionized Outflows Extending to   2 kpc in Seyfert 1 Galaxy NGC 7469 Revealed by the Very Large Telescope/MUSE\n",
      "\tABSTRACT: The Seyfert 1 galaxy NGC 7469 possesses a prominent nuclear starburst ring and a luminous active galactic nucleus (AGN). Evidence of an outflow in the innermost nuclear region has been found in previous works. We detect the ionized gas outflow on a larger scale in the galaxy using the archival Very Large Telescope/MUSE and Chandra observations. The optical emission lines are modeled using two Gaussian components, and a nonparametric approach is applied to measure the kinematics of [O III] and Hα emitting gas. Line ratio diagnostics and spatially resolved maps are derived to examine the origin of the outflow. The kiloparsec-scale kinematics of [O III] are dominated by a blueshifted component whereas the velocity map of Hα shows a rotational disk with a complex nonrotational substructure. The starburst wind around the circumnuclear ring is confirmed, and we find evidence of an AGN-driven outflow extending to a radial distance of ~ 2 kpc from the nucleus, with a morphology consistent with a nearly face-on ionization cone. The previously reported circumnuclear outflow resembles part of the bright base. We derive mass and energy outflow rates for both the starburst wind and the AGN-driven outflow. The estimated kinetic coupling efficiency of the kiloparsec-scale AGN outflow is ${\\dot{E}}_{\\mathrm{out}}/{L}_{\\mathrm{bol}}\\sim 0.1 \\% $ Ėout/Lbol~0.1% , lower than the threshold predicted by the \"two-stage\" theoretical model for effective feedback. Our results reinforce the importance of spatially resolved study to disentangle feedback where AGNs and starbursts coexist, which may be common during the cosmic noon of black hole and galaxy growth.\n",
      "\n",
      "AUTHOR ASSIGNED: ['seyfert galaxies', 'luminous infrared galaxies', 'interstellar medium wind']\n",
      "MODEL ASSIGNED: [('galactic winds', '0.9747'), ('active galactic nuclei', '0.4349'), ('starburst galaxies', '0.3764'), ('galaxy evolution', '0.2259')]\n",
      "\n",
      "NEXT SCORES [('stellar feedback', '0.0507'), ('agn host galaxies', '0.0381'), ('star formation', '0.0153'), ('galaxy interactions', '0.0127'), ('galaxy kinematics', '0.0120'), ('active galaxies', '0.0085'), ('interstellar medium', '0.0061'), ('galaxy winds', '0.0045'), ('seyfert galaxies', '0.0020'), ('hydrodynamical simulations', '0.0018'), ('compact galaxies', '0.0018')]\n",
      "\n",
      "\n",
      "BIBCODE: 2023ApJ...958...52T\n",
      "\tTITLE: X-Ray Spectral Variations of Circinus X-1 Observed with NICER throughout an Entire Orbital Cycle\n",
      "\tABSTRACT: Circinus X-1 (Cir X-1) is a neutron star binary with an elliptical orbit of 16.6 days. The source is unique for its extreme youth, providing a key to understanding early binary evolution. However, its X-ray variability is too complex to reach a clear interpretation. We conducted the first high-cadence (every 4 hr, on average) observations covering one entire orbit using the NICER X-ray telescope. The X-ray flux behavior can be divided into stable, dip, and flaring phases. The X-ray spectra in all phases can be described by a common model consisting of a partially covered disk blackbody emission and the line features from a highly ionized photoionized plasma. The spectral change over the orbit is attributable to rapid changes of the partial covering medium in the line of sight and gradual changes of the disk blackbody emission. Emission lines of H- and He-like Mg, Si, S, and Fe are detected, most prominently in the dip phase. The Fe emission lines change to absorption in the course of the transition from the dip phase to the flaring phase. The estimated ionization degree indicates no significant changes, suggesting that the photoionized plasma is stable over the orbit. We propose a simple model in which the disk blackbody emission is partially blocked by a local medium in the line of sight that has spatial structures depending on the azimuth of the accretion disk. Emission lines upon the continuum emission are from the photoionized plasma located outside of the blocking material.\n",
      "\n",
      "AUTHOR ASSIGNED: ['x-ray binary stars', 'atomic spectroscopy', 'spectroscopy', 'ionization', 'plasma astrophysics', 'high energy astrophysics']\n",
      "MODEL ASSIGNED: [('x-ray astronomy', '0.9219'), ('high mass x-ray binary stars', '0.3562')]\n",
      "\n",
      "NEXT SCORES [('x-ray binary stars', '0.1496'), ('x-ray sources', '0.0848'), ('polarimetry', '0.0595'), ('black hole physics', '0.0335'), ('radio jets', '0.0133'), ('x-ray active galactic nuclei', '0.0117'), ('active galactic nuclei', '0.0066'), ('non-thermal radiation sources', '0.0048'), ('x-ray observatories', '0.0044'), ('x-ray detectors', '0.0037'), ('symbiotic binary stars', '0.0032'), ('extragalactic astronomy', '0.0031'), ('spectroscopy', '0.0026'), ('gamma-ray bursts', '0.0022'), ('high energy astrophysics', '0.0021'), ('quiet solar corona', '0.0020'), ('astrophysical black holes', '0.0017'), ('spectropolarimetry', '0.0015')]\n",
      "\n",
      "\n",
      "BIBCODE: 2024AJ....167...64B\n",
      "\tTITLE: VLTI/GRAVITY Provides Evidence the Young, Substellar Companion HD 136164 Ab Formed Like a \"Failed Star\"\n",
      "\tABSTRACT: Young, low-mass brown dwarfs orbiting early-type stars, with low mass ratios (q ≲ 0.01), appear to be intrinsically rare and present a formation dilemma: could a handful of these objects be the highest-mass outcomes of \"planetary\" formation channels (bottom up within a protoplanetary disk), or are they more representative of the lowest-mass \"failed binaries\" (formed via disk fragmentation or core fragmentation)? Additionally, their orbits can yield model-independent dynamical masses, and when paired with wide wavelength coverage and accurate system age estimates, can constrain evolutionary models in a regime where the models have a wide dispersion depending on the initial conditions. We present new interferometric observations of the 16 Myr substellar companion HD 136164 Ab (HIP 75056 Ab) made with the Very Large Telescope Interferometer (VLTI)/GRAVITY and an updated orbit fit including proper motion measurements from the Hipparcos-Gaia Catalog of Accelerations. We estimate a dynamical mass of 35 ± 10 M <SUB>J</SUB> (q ~ 0.02), making HD 136164 Ab the youngest substellar companion with a dynamical mass estimate. The new mass and newly constrained orbital eccentricity (e = 0.44 ± 0.03) and separation (22.5 ± 1 au) could indicate that the companion formed via the low-mass tail of the initial mass function. Our atmospheric fit to a SPHINX M-dwarf model grid suggests a subsolar C/O ratio of 0.45 and 3 × solar metallicity, which could indicate formation in a circumstellar disk via disk fragmentation. Either way, the revised mass estimate likely excludes bottom-up formation via core accretion in a circumstellar disk. HD 136164 Ab joins a select group of young substellar objects with dynamical mass estimates; epoch astrometry from future Gaia data releases will constrain the dynamical mass of this crucial object further.\n",
      "\n",
      "AUTHOR ASSIGNED: ['brown dwarfs', 'substellar companion stars', 'orbit determination', 'orbits']\n",
      "MODEL ASSIGNED: [('radial velocity', '0.8792'), ('exoplanets', '0.8334'), ('exoplanet detection methods', '0.5305'), ('exoplanet dynamics', '0.3328'), ('extrasolar gaseous giant planets', '0.2636')]\n",
      "\n",
      "NEXT SCORES [('direct imaging', '0.0761'), ('transit photometry', '0.0569'), ('exoplanet atmospheres', '0.0440'), ('exoplanet evolution', '0.0425'), ('exoplanet structure', '0.0157'), ('brown dwarfs', '0.0146'), ('exoplanet systems', '0.0138'), ('solar-terrestrial interactions', '0.0083'), ('atmospheric composition', '0.0079'), ('natural satellites (extrasolar)', '0.0056'), ('interferometers', '0.0056'), ('gaussian processes regression', '0.0049'), ('exoplanet atmospheric variability', '0.0043'), ('exoplanet atmospheric composition', '0.0028'), ('astrometry', '0.0023'), ('interplanetary magnetic fields', '0.0019'), ('solar analogs', '0.0016')]\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# MAIN DEMO \n",
    "# pick some samples from our dataset\n",
    "# this is a list of strings\n",
    "\n",
    "num_pred = 3\n",
    "start = 510\n",
    "\n",
    "temp_dataset = uat_dataset['val'][start:start+num_pred]\n",
    "sentences = [str(t)+' '+str(a) for t,a in zip(temp_dataset['title'],\n",
    "                                      temp_dataset['abstract'])\n",
    "             if t\n",
    "            ]\n",
    "\n",
    "# make predictions\n",
    "all_sentence_scores = pipe(sentences)\n",
    "\n",
    "# we need to change the output of the model to strings to make it compatible with the next version.\n",
    "# it's best to think of the outputs as labels anyways, not as integers\n",
    "all_sentence_scores = [[{'label':str(s['label']), 'score':s['score']} for s in sample_scores] for sample_scores in all_sentence_scores]\n",
    "\n",
    "# format for readability, and show top k scores\n",
    "threshold = 0.15\n",
    "top_sentence_scores = [[ {'label':uat_names[l['label']], 'score':l['score']} \n",
    "                        for l in top_k_scores(s, k=1000) if l['score']>=threshold] \n",
    "                       for s in all_sentence_scores]\n",
    "\n",
    "next_sentence_scores = [[ {'label':uat_names[l['label']], 'score':l['score']} \n",
    "                        for l in top_k_scores(s, k=1000) if l['score']<=threshold and  l['score']>=0.01*threshold] \n",
    "                       for s in all_sentence_scores]\n",
    "\n",
    "for i in range(min(10,num_pred)):\n",
    "    print('BIBCODE:', temp_dataset['bibcode'][i])\n",
    "    print('\\tTITLE:', temp_dataset['title'][i])\n",
    "    print('\\tABSTRACT:', temp_dataset['abstract'][i])\n",
    "    print()\n",
    "    print('AUTHOR ASSIGNED:', temp_dataset['verified_uat_labels'][i])\n",
    "\n",
    "    if len(top_sentence_scores[i])>0:\n",
    "        print('MODEL ASSIGNED:', [(x['label'], '{:.4f}'.format(x['score'])) for x in top_sentence_scores[i]] )\n",
    "        print()\n",
    "        print('NEXT SCORES', [(x['label'], '{:.4f}'.format(x['score'])) for x in next_sentence_scores[i]] )\n",
    "    \n",
    "    print() \n",
    "    print()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "9d90c6a5-7a0a-4762-ae5d-33523bd6bc9a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Note: truncation is in effect and long sentences will only take into account the first 512 tokens\n",
    "sentences = [' '.join(['1' for i in range(j) ]) for j in range(505,515)]\n",
    "sentence_scores = pipe(sentences)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "e4b80a73-577a-464c-add2-2356c3adff18",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(505, [{'label': 2189, 'score': 0.8530406355857849}]),\n",
       " (506, [{'label': 2189, 'score': 0.8583861589431763}]),\n",
       " (507, [{'label': 2189, 'score': 0.8545922040939331}]),\n",
       " (508, [{'label': 2189, 'score': 0.8484249114990234}]),\n",
       " (509, [{'label': 2189, 'score': 0.8524807095527649}]),\n",
       " (510, [{'label': 2189, 'score': 0.8559486269950867}]),\n",
       " (511, [{'label': 2189, 'score': 0.8559486865997314}]),\n",
       " (512, [{'label': 2189, 'score': 0.8559486269950867}]),\n",
       " (513, [{'label': 2189, 'score': 0.8559486269950867}]),\n",
       " (514, [{'label': 2189, 'score': 0.8559486865997314}])]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[(len(sent.split()), top_k_scores(scores, k=1)) for sent, scores in zip(sentences,sentence_scores)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "0cd3c521-b25e-4768-b3f8-14f28d9c9c48",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Demo with manually input  astro sentences\n",
    "sentences = ['This work discusses a junction-less nanowire tunnel field effect transistor (JLN-TFET) that combines the advantages of a junction-less field effect transistor (JLFET) and a tunnel field effect transistor (TFET). With a hetero-structure device made of silicon (Si) and germanium (Ge), an amalgamation of gate engineering and channel engineering is investigated. To eliminate junctions in the structure, a uniformly high dosage of doping (1019cm-3) has been employed throughout. In contrast to the source work function, which is set at 5.93 eV, the gate work function is set at 4.5 eV. When compared to junction less nanowire tunnel FET (JLTFET), the modified gate-all-around hetero junction less nanowire tunnel field effect transistor (GAA-H-JLNTFET) performs better. The proposed structure GAA-H-JLNTFET exhibits the ON current (ION) 6.5 × 10−5µA/m, the off current (IOFF) measures 2.97 × 10−20µA/m, the subthreshold slope (SS) is 12mV/Dec, and ION/IOFFis≈1015which makes them immune to short channel effect and suitable for low power application in Nano regime. Further, in this work, the proposed structure is utilized to implement the dielectric modulated low-power biosensor. The drain current is taken as the sensitivity parameter. Five different biomolecules sensitivity are measured and found better than the previous published results. For the simulation and analysis, the Silvaco Atlas 2D simulator with non-local band-to-band tunneling is used. ',\n",
    "             'We report observations from the Hubble Space Telescope (HST) of Cepheid variables in the host galaxies of 42 Type Ia supernovae (SNe Ia) used to calibrate the Hubble constant (H 0). These include the complete sample of all suitable SNe Ia discovered in the last four decades at redshift z ≤ 0.01, collected and calibrated from ≥1000 HST orbits, more than doubling the sample whose size limits the precision of the direct determination of H 0. The Cepheids are calibrated geometrically from Gaia EDR3 parallaxes, masers in NGC 4258 (here tripling that sample of Cepheids), and detached eclipsing binaries in the Large Magellanic Cloud. All Cepheids in these anchors and SN Ia hosts were measured with the same instrument (WFC3) and filters (F555W, F814W, F160W) to negate zero-point errors. We present multiple verifications of Cepheid photometry and six tests of background determinations that show Cepheid measurements are accurate in the presence of crowded backgrounds. The SNe Ia in these hosts calibrate the magnitude-redshift relation from the revised Pantheon+ compilation, accounting here for covariance between all SN data and with host properties and SN surveys matched throughout to negate systematics. We decrease the uncertainty in the local determination of H 0 to 1 km s-1 Mpc-1 including systematics. We present results for a comprehensive set of nearly 70 analysis variants to explore the sensitivity of H 0 to selections of anchors, SN surveys, redshift ranges, the treatment of Cepheid dust, metallicity, form of the period-luminosity relation, SN color, peculiar-velocity corrections, sample bifurcations, and simultaneous measurement of the expansion history. Our baseline result from the Cepheid-SN Ia sample is H 0 = 73.04 ± 1.04 km s-1 Mpc-1, which includes systematic uncertainties and lies near the median of all analysis variants. We demonstrate consistency with measures from HST of the TRGB between SN Ia hosts and NGC 4258, and include them simultaneously to yield 72.53 ± 0.99 km s-1 Mpc-1. The inclusion of high-redshift SNe Ia yields H 0 = 73.30 ± 1.04 km s-1 Mpc-1 and q 0 = -0.51 ± 0.024. We find a 5σ difference with the prediction of H 0 from Planck cosmic microwave background observations under ΛCDM, with no indication that the discrepancy arises from measurement uncertainties or analysis variations considered to date. The source of this now long-standing discrepancy between direct and cosmological routes to determining H 0 remains unknown.',\n",
    "             'We use archival COBE/DIRBE data to construct a map of polycyclic aromatic hydrocarbon (PAH) emission in the λ-Orionis region. The presence of the 3.3 μm PAH feature within the DIRBE 3.5 μm band and the corresponding lack of significant PAH spectral features in the adjacent DIRBE bands (1.25, 2.2, and 4.9 μm) enable estimation of the PAH contribution to the 3.5 μm data. Having the shortest wavelength of known PAH features, the 3.3 μm feature probes the smallest PAHs, which are also the leading candidates for carriers of anomalous microwave emission (AME). We use this map to investigate the association between the AME and the emission from PAH molecules. We find that the spatial correlation in λ-Orionis is higher between AME and far-infrared dust emission (as represented by the DIRBE 240 μm map) than it is between our PAH map and AME. This finding, in agreement with previous studies using PAH features at longer wavelengths, is in tension with the hypothesis that AME is due to spinning PAHs. However, the expected correlation between mid-infrared and microwave emission could potentially be degraded by different sensitivities of each emission mechanism to local environmental conditions even if PAHs are the carriers of both.',\n",
    "             'THis is a noew sentence with typoes and not really about astro anyways',\n",
    "            ]\n",
    "\n",
    "\n",
    "all_sentence_scores = pipe(sentences)\n",
    "# again convert to strings, to future-proof\n",
    "all_sentence_scores = [[{'label':str(s['label']), 'score':s['score']} for s in sample_scores] for sample_scores in all_sentence_scores]\n",
    "\n",
    "top_sentence_scores = [[ {'label':uat_names[l['label']], 'score':l['score']} \n",
    "                        for l in top_k_scores(s, k=3)] \n",
    "                       for s in all_sentence_scores]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "c7aea0c3-ae49-464a-bca9-ae46cbeb1cbc",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[{'label': 'astronomical instrumentation', 'score': 0.9473740458488464},\n",
       "  {'label': 'astronomy data modeling', 'score': 0.49145984649658203},\n",
       "  {'label': 'space vehicle instruments', 'score': 0.39475584030151367}],\n",
       " [{'label': 'hubble constant', 'score': 0.993058443069458},\n",
       "  {'label': 'cosmology', 'score': 0.006782000884413719},\n",
       "  {'label': 'planetary nebulae', 'score': 0.001997443614527583}],\n",
       " [{'label': 'interstellar dust', 'score': 0.9986469149589539},\n",
       "  {'label': 'polycyclic aromatic hydrocarbons', 'score': 0.99810791015625},\n",
       "  {'label': 'interstellar medium', 'score': 0.9940189123153687}],\n",
       " [{'label': 'time series analysis', 'score': 0.2862895429134369},\n",
       "  {'label': 'astronomy data analysis', 'score': 0.08515171706676483},\n",
       "  {'label': 'optical telescopes', 'score': 0.04666740819811821}]]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "top_sentence_scores"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "3a7b8b95-9917-4408-85fb-e74186f9ca7d",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'label': '0', 'score': 4.918351805827115e-07},\n",
       " {'label': '2', 'score': 2.4442993407092217e-08},\n",
       " {'label': '3', 'score': 1.2135334372942452e-06},\n",
       " {'label': '4', 'score': 7.317441941268044e-08},\n",
       " {'label': '5', 'score': 1.078589502867544e-05},\n",
       " {'label': '6', 'score': 2.913926877567974e-08},\n",
       " {'label': '7', 'score': 2.268499343927033e-07},\n",
       " {'label': '8', 'score': 2.4015573529823087e-08},\n",
       " {'label': '9', 'score': 2.046675717792823e-08},\n",
       " {'label': '10', 'score': 1.2331685006472526e-08}]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# the score for every label is available\n",
    "all_sentence_scores[0][0:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "2cb6636d-7ccd-415f-b30e-68982ad9afbb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'label': '2363', 'score': 7.282670910768729e-09},\n",
       " {'label': '2364', 'score': 3.004765858349856e-07},\n",
       " {'label': '2365', 'score': 1.247675868398801e-06},\n",
       " {'label': '2366', 'score': 5.6775856904778266e-08},\n",
       " {'label': '2367', 'score': 3.278066174061678e-08},\n",
       " {'label': '2368', 'score': 1.39621681682911e-06},\n",
       " {'label': '2369', 'score': 0.00028976111207157373},\n",
       " {'label': '2370', 'score': 1.9995159163954668e-06},\n",
       " {'label': '2371', 'score': 2.577130089775892e-06},\n",
       " {'label': '2372', 'score': 3.240722179498334e-08}]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_sentence_scores[0][-10:]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8",
   "language": "python",
   "name": "python3.8"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}