ZYMScott commited on
Commit
dd2a5fa
·
verified ·
1 Parent(s): 3a7dea0

Upload tokenizer

Browse files
Files changed (4) hide show
  1. added_tokens.json +542 -0
  2. merges.txt +1 -1
  3. tokenizer.json +0 -0
  4. tokenizer_config.json +0 -0
added_tokens.json ADDED
@@ -0,0 +1,542 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "0813-124 phase II": 50522,
3
+ "090008": 50332,
4
+ "1.2.20": 50304,
5
+ "15H5D-4a": 50748,
6
+ "1692": 50508,
7
+ "174/2": 50543,
8
+ "17Nkhm-UP2": 50435,
9
+ "18EpOKYJ": 50706,
10
+ "200023": 50500,
11
+ "21A": 50674,
12
+ "24.1": 50783,
13
+ "301": 50623,
14
+ "3347689II": 50678,
15
+ "3937": 50659,
16
+ "477": 50792,
17
+ "49125": 50351,
18
+ "5D": 50314,
19
+ "640": 50321,
20
+ "670-83": 50702,
21
+ "675": 50728,
22
+ "6D370": 50415,
23
+ "757": 50392,
24
+ "78-1320": 50587,
25
+ "7A": 50517,
26
+ "80813": 50647,
27
+ "A1": 50716,
28
+ "A1122": 50428,
29
+ "A2-F21": 50786,
30
+ "A212-S19-A16": 50506,
31
+ "A23BA": 50457,
32
+ "A398-S21-F17": 50328,
33
+ "ACYC.E9L": 50497,
34
+ "ANU1": 50612,
35
+ "AR": 50777,
36
+ "ARAD": 50681,
37
+ "AR_0082": 50756,
38
+ "AS9": 50791,
39
+ "ATCC 13028": 50621,
40
+ "ATCC 39140": 50646,
41
+ "ATCC 43969": 50536,
42
+ "ATCC 51329": 50790,
43
+ "ATCC BAA-895": 50635,
44
+ "AVS0177": 50593,
45
+ "Annandia": 50514,
46
+ "ArsBeeUS": 50696,
47
+ "Arsenophonus": 50640,
48
+ "Arsenophonus apicola": 50637,
49
+ "Arsenophonus endosymbiont of Aleurodicus dispersus": 50648,
50
+ "Arsenophonus endosymbiont of Aphis craccivora": 50348,
51
+ "Arsenophonus nasoniae": 50712,
52
+ "Atlantibacter": 50775,
53
+ "Atlantibacter hermannii": 50759,
54
+ "Atlantibacter subterranea": 50421,
55
+ "BDA62-3": 50299,
56
+ "BHKY": 50364,
57
+ "BO-1": 50704,
58
+ "BPEN": 50310,
59
+ "BVAF": 50461,
60
+ "BY21311": 50430,
61
+ "Bacteria": 50690,
62
+ "Blochmannia": 50447,
63
+ "Blochmannia endosymbiont of Camponotus (Colobopsis) obliquus": 50779,
64
+ "Blochmannia endosymbiont of Camponotus modoc": 50586,
65
+ "Blochmannia endosymbiont of Camponotus nipponensis": 50525,
66
+ "Blochmannia endosymbiont of Colobopsis nipponica": 50422,
67
+ "Blochmannia endosymbiont of Polyrhachis (Hedomyrma) turneri": 50427,
68
+ "Brenneria": 50745,
69
+ "Brenneria goodwinii": 50259,
70
+ "Brenneria izadpanahii": 50708,
71
+ "Brenneria nigrifluens": 50436,
72
+ "Brenneria rubrifaciens": 50396,
73
+ "Brenneria ulupoensis": 50721,
74
+ "Bruguierivoracaceae": 50316,
75
+ "Buchnera": 50289,
76
+ "Buchnera aphidicola": 50505,
77
+ "Budviciaceae": 50433,
78
+ "Buttiauxella": 50750,
79
+ "Buttiauxella agrestis": 50740,
80
+ "Buttiauxella ferragutiae": 50356,
81
+ "C-002": 50598,
82
+ "C-005": 50504,
83
+ "C-006": 50362,
84
+ "C-050": 50741,
85
+ "C-7-2": 50575,
86
+ "CAVP490": 50582,
87
+ "CB": 50568,
88
+ "CCA6": 50758,
89
+ "CCUG 66741": 50776,
90
+ "CF-458": 50595,
91
+ "CFBP 3304": 50589,
92
+ "CFCC10813": 50401,
93
+ "CFPB1430": 50282,
94
+ "CFS1934": 50469,
95
+ "CQ10": 50785,
96
+ "CS-931": 50680,
97
+ "Candidatus": 50446,
98
+ "Candidatus Arsenophonus lipoptenae": 50496,
99
+ "Candidatus Blochmannia pennsylvanicus": 50484,
100
+ "Candidatus Blochmannia vafer": 50622,
101
+ "Candidatus Doolittlea endobia": 50669,
102
+ "Candidatus Fukatsuia symbiotica": 50701,
103
+ "Candidatus Gullanella endobia": 50766,
104
+ "Candidatus Hoaglandella endobia": 50509,
105
+ "Candidatus Mikella endobia": 50315,
106
+ "Candidatus Purcelliella pentastirinorum": 50738,
107
+ "Candidatus Riesia pediculicola": 50274,
108
+ "Candidatus Tachikawaea gelatinosa": 50366,
109
+ "Candidatus Westeberhardia cardiocondylae": 50261,
110
+ "Candidatus blochmannia chromaiodes": 50501,
111
+ "Candidatus ishikawaella capsulata": 50720,
112
+ "Candidatus moranella endobia": 50291,
113
+ "Candidatus sodalis pierantonius": 50608,
114
+ "Candidatus_antoea carbekii": 50407,
115
+ "Candidatus_ukatsuia": 50487,
116
+ "Cedecea": 50510,
117
+ "Cedecea lapagei": 50618,
118
+ "Cedecea neteri": 50719,
119
+ "Cf7303": 50743,
120
+ "Chania": 50558,
121
+ "Chania multitudinisentens": 50656,
122
+ "Citrobacter": 50523,
123
+ "Citrobacter amalonaticus": 50290,
124
+ "Citrobacter arsenatis": 50470,
125
+ "Citrobacter braakii": 50466,
126
+ "Citrobacter freundii": 50653,
127
+ "Citrobacter koseri": 50602,
128
+ "Citrobacter portucalensis": 50309,
129
+ "Citrobacter rodentium": 50664,
130
+ "Citrobacter sedlakii": 50423,
131
+ "Citrobacter tructae": 50463,
132
+ "Citrobacter werkmanii": 50414,
133
+ "Cp2": 50385,
134
+ "Cronobacter": 50402,
135
+ "Cronobacter condimenti": 50443,
136
+ "Cronobacter dublinensis": 50372,
137
+ "Cronobacter malonaticus": 50542,
138
+ "Cronobacter muytjensii": 50760,
139
+ "Cronobacter sakazakii": 50734,
140
+ "Cronobacter universalis": 50349,
141
+ "DH-S01": 50600,
142
+ "DSM 101947": 50442,
143
+ "DSM 102253": 50387,
144
+ "DSM 107547": 50677,
145
+ "DSM 15199": 50529,
146
+ "DSM 16636": 50493,
147
+ "DSM 16690": 50521,
148
+ "DSM 22758": 50585,
149
+ "DSM 32899": 50345,
150
+ "DSM 4481": 50533,
151
+ "DSM 4576": 50651,
152
+ "DSM 9389": 50762,
153
+ "Dickeya": 50545,
154
+ "Dickeya aquatica": 50502,
155
+ "Dickeya chrysanthemi": 50393,
156
+ "Dickeya dadantii": 50703,
157
+ "Dickeya dianthicola": 50524,
158
+ "Dickeya fangzhongdai": 50692,
159
+ "Dickeya parazeae": 50302,
160
+ "Dickeya poaceiphila": 50404,
161
+ "Dickeya solani": 50604,
162
+ "Dickeya zeae": 50263,
163
+ "Doolittlea": 50382,
164
+ "Duffyella": 50660,
165
+ "Duffyella gerundensis": 50645,
166
+ "EBP3064": 50429,
167
+ "EN-119": 50537,
168
+ "ERMR1:05": 50267,
169
+ "Eb661": 50569,
170
+ "Ech1591": 50699,
171
+ "Ech586": 50337,
172
+ "Ech703": 50273,
173
+ "Edwardsiella": 50601,
174
+ "Edwardsiella anguillarum": 50313,
175
+ "Edwardsiella hoshinae": 50462,
176
+ "Edwardsiella ictaluri": 50285,
177
+ "Edwardsiella piscicida": 50476,
178
+ "Edwardsiella tarda": 50691,
179
+ "Enterobacter": 50311,
180
+ "Enterobacter asburiae": 50705,
181
+ "Enterobacter bugandensis": 50467,
182
+ "Enterobacter chengduensis": 50494,
183
+ "Enterobacter cloacae": 50667,
184
+ "Enterobacter hormaechei": 50688,
185
+ "Enterobacter huaxiensis": 50363,
186
+ "Enterobacter ludwigii": 50742,
187
+ "Enterobacter mori": 50295,
188
+ "Enterobacter oligotrophicus": 50685,
189
+ "Enterobacter pseudoroggenkampii": 50383,
190
+ "Enterobacter roggenkampii": 50540,
191
+ "Enterobacter sichuanensis": 50276,
192
+ "Enterobacter soli": 50679,
193
+ "Enterobacterales": 50513,
194
+ "Enterobacteriaceae": 50610,
195
+ "Enterobacteriaceae endosymbiont of Macroplea mutica": 50489,
196
+ "Enterobacteriaceae endosymbiont of Plateumaris pusilla": 50344,
197
+ "Enterobacteriaceae endosymbiont of_acroplea mutica": 50636,
198
+ "EpK1/15": 50528,
199
+ "ErCicurvipes": 50403,
200
+ "Erwinia": 50475,
201
+ "Erwinia amylovora": 50319,
202
+ "Erwinia billingiae": 50652,
203
+ "Erwinia persicina": 50693,
204
+ "Erwinia pyrifoliae": 50359,
205
+ "Erwinia rhapontici": 50516,
206
+ "Erwinia sorbitola": 50736,
207
+ "Erwinia tasmaniensis": 50606,
208
+ "Erwinia tracheiphila": 50733,
209
+ "Erwiniaceae": 50576,
210
+ "Escherichia": 50723,
211
+ "Escherichia albertii": 50642,
212
+ "Escherichia coli ": 50628,
213
+ "Escherichia fergusonii": 50279,
214
+ "Escherichia marmotae": 50431,
215
+ "Et1/99": 50668,
216
+ "FDAARGOS 1447": 50499,
217
+ "FDAARGOS_1499": 50481,
218
+ "FDAARGOS_165": 50346,
219
+ "FDAARGOS_186": 50418,
220
+ "FDAARGOS_392": 50468,
221
+ "FDAARGOS_408": 50491,
222
+ "FDAARGOS_500": 50384,
223
+ "FDAARGOS_616": 50455,
224
+ "FDAARGOS_730": 50794,
225
+ "FDAARGOS_926": 50599,
226
+ "FDAARGOS_940": 50675,
227
+ "FIN": 50746,
228
+ "FN20211": 50488,
229
+ "FRB141": 50338,
230
+ "FRB97": 50486,
231
+ "FRM16": 50343,
232
+ "FY-07": 50451,
233
+ "FY158": 50737,
234
+ "G5": 50551,
235
+ "G6": 50386,
236
+ "Gammaproteobacteria": 50747,
237
+ "Gibbsiella": 50503,
238
+ "Gibbsiella quercinecans": 50571,
239
+ "Gullanella": 50334,
240
+ "H4-C11": 50465,
241
+ "HI4320": 50550,
242
+ "HS1": 50726,
243
+ "HS11286": 50769,
244
+ "HYN0051": 50754,
245
+ "Hafnia": 50312,
246
+ "Hafnia alvei": 50397,
247
+ "Hafnia paralvei": 50278,
248
+ "Hafniaceae": 50672,
249
+ "Hoaglandella": 50388,
250
+ "IFB5427": 50266,
251
+ "IP32953": 50419,
252
+ "Iran 50": 50567,
253
+ "Ishikawaella": 50574,
254
+ "J780": 50552,
255
+ "JH01": 50444,
256
+ "JK2.1": 50293,
257
+ "JZ-GX1": 50770,
258
+ "JZB2120001": 50262,
259
+ "Jejubacter": 50573,
260
+ "Jejubacter calystegiae": 50566,
261
+ "K-12 substr. MG1655": 50518,
262
+ "K61": 50725,
263
+ "KACC 18508": 50774,
264
+ "KC-Pc-HB1": 50671,
265
+ "KMM821": 50287,
266
+ "KSNA2": 50590,
267
+ "KUDC3025": 50485,
268
+ "Ka37751": 50639,
269
+ "Kalro": 50411,
270
+ "Klebsiella": 50474,
271
+ "Klebsiella aerogenes": 50630,
272
+ "Klebsiella africana": 50594,
273
+ "Klebsiella electrica": 50275,
274
+ "Klebsiella huaxiensis": 50439,
275
+ "Klebsiella michiganensis": 50417,
276
+ "Klebsiella oxytoca": 50298,
277
+ "Klebsiella pasteurii": 50546,
278
+ "Klebsiella pneumoniae": 50557,
279
+ "Klebsiella quasipneumoniae": 50658,
280
+ "Klebsiella variicola": 50507,
281
+ "Kluyvera": 50613,
282
+ "Kluyvera ascorbata": 50520,
283
+ "Kluyvera intermedia": 50534,
284
+ "Kosakonia": 50413,
285
+ "Kosakonia arachidis": 50644,
286
+ "Kosakonia cowanii": 50325,
287
+ "Kosakonia oryzae": 50318,
288
+ "Kosakonia oryzendophytica": 50687,
289
+ "Kosakonia pseudosacchari": 50673,
290
+ "Kosakonia radicincitans": 50272,
291
+ "Kosakonia sacchari": 50554,
292
+ "KqPF26": 50257,
293
+ "L6": 50292,
294
+ "LEMB11": 50614,
295
+ "LF7a": 50580,
296
+ "LH84-a": 50342,
297
+ "LJ1": 50663,
298
+ "LMG 23823": 50649,
299
+ "LMG 23826": 50731,
300
+ "LMG 24197": 50391,
301
+ "LMG 24199": 50297,
302
+ "LMG 26250": 50532,
303
+ "LMG24200": 50729,
304
+ "LST-1": 50562,
305
+ "LT-1": 50270,
306
+ "LT2": 50347,
307
+ "LTYR-11Z": 50695,
308
+ "LY-1": 50498,
309
+ "Leclercia": 50684,
310
+ "Leclercia adecarboxylata": 50381,
311
+ "Leclercia pneumoniae": 50781,
312
+ "Lelliottia": 50445,
313
+ "Lelliottia steviae": 50722,
314
+ "Leminorella": 50339,
315
+ "Leminorella richardii": 50512,
316
+ "Limnobaculum": 50333,
317
+ "Limnobaculum parvum": 50670,
318
+ "Limnobaculum zhutongyuii": 50300,
319
+ "Lonsdalea": 50634,
320
+ "Lonsdalea britannica": 50350,
321
+ "Lonsdalea populi": 50519,
322
+ "Lsch": 50379,
323
+ "ME23": 50563,
324
+ "MS2": 50424,
325
+ "MiY-A": 50395,
326
+ "Mikella": 50572,
327
+ "Mixta": 50556,
328
+ "Mixta gaviniae": 50744,
329
+ "Mixta hanseatica": 50390,
330
+ "Mixta intestinalis": 50570,
331
+ "Moellerella": 50788,
332
+ "Moellerella wisconsensis": 50714,
333
+ "Moranella": 50654,
334
+ "Morganella": 50326,
335
+ "Morganella morganii": 50360,
336
+ "Morganellaceae": 50441,
337
+ "Mpkobe": 50753,
338
+ "Musicola": 50603,
339
+ "Musicola paradisiaca": 50625,
340
+ "N-5-1": 50609,
341
+ "N2-1": 50416,
342
+ "N268-08": 50559,
343
+ "NA": 50629,
344
+ "NCPPB 569": 50268,
345
+ "NCTC 14382": 50605,
346
+ "NCTC 9529": 50412,
347
+ "NCTC11466": 50697,
348
+ "NCTC12003": 50281,
349
+ "NCTC12148": 50377,
350
+ "NCTC12151": 50449,
351
+ "NCTC12284": 50694,
352
+ "NCTC13188": 50495,
353
+ "NIBIO1392": 50555,
354
+ "OLIH": 50438,
355
+ "Ola 51": 50330,
356
+ "PA13": 50305,
357
+ "PCVAL": 50432,
358
+ "PPO 9019": 50711,
359
+ "PR-310": 50265,
360
+ "PRI-2C": 50477,
361
+ "Pantoea": 50511,
362
+ "Pantoea agglomerans": 50483,
363
+ "Pantoea alfalfae": 50323,
364
+ "Pantoea alhagi": 50535,
365
+ "Pantoea ananatis": 50643,
366
+ "Pantoea deleyi": 50453,
367
+ "Pantoea dispersa": 50479,
368
+ "Pantoea eucalypti": 50264,
369
+ "Pantoea eucrina": 50471,
370
+ "Pantoea soli": 50294,
371
+ "Pantoea stewartii": 50757,
372
+ "Pantoea vagans": 50340,
373
+ "Pectobacteriaceae": 50341,
374
+ "Pectobacterium": 50665,
375
+ "Pectobacterium aquaticum": 50303,
376
+ "Pectobacterium aroidearum": 50420,
377
+ "Pectobacterium atrosepticum": 50778,
378
+ "Pectobacterium brasiliense": 50378,
379
+ "Pectobacterium cacticida": 50394,
380
+ "Pectobacterium carotovorum": 50369,
381
+ "Pectobacterium colocasium": 50727,
382
+ "Pectobacterium odoriferum": 50408,
383
+ "Pectobacterium parmentieri": 50713,
384
+ "Pectobacterium parvum": 50492,
385
+ "Pectobacterium polaris": 50768,
386
+ "Pectobacterium punjabense": 50288,
387
+ "Pectobacterium quasiaquaticum": 50380,
388
+ "Pectobacterium wasabiae": 50771,
389
+ "Photorhabdus": 50472,
390
+ "Photorhabdus akhurstii": 50717,
391
+ "Photorhabdus asymbiotica": 50324,
392
+ "Photorhabdus laumondii": 50548,
393
+ "Photorhabdus thracensis": 50322,
394
+ "Phytobacter": 50538,
395
+ "Phytobacter diazotrophicus": 50650,
396
+ "Plesiomonas": 50710,
397
+ "Plesiomonas shigelloides": 50440,
398
+ "Pluralibacter": 50478,
399
+ "Pluralibacter gergoviae": 50409,
400
+ "Pragia": 50591,
401
+ "Pragia fontium": 50277,
402
+ "Profftia": 50666,
403
+ "Proteus": 50773,
404
+ "Proteus hauseri": 50626,
405
+ "Proteus mirabilis": 50454,
406
+ "Proteus penneri": 50751,
407
+ "Proteus terrae": 50541,
408
+ "Providencia": 50683,
409
+ "Providencia alcalifaciens": 50655,
410
+ "Providencia hangzhouensis": 50689,
411
+ "Providencia heimbachae": 50755,
412
+ "Providencia huaxiensis": 50588,
413
+ "Providencia rettgeri": 50373,
414
+ "Providencia stuartii": 50700,
415
+ "Pseudocitrobacter": 50375,
416
+ "Pseudocitrobacter corydidari": 50676,
417
+ "Pseudomonadota": 50682,
418
+ "Purcelliella": 50553,
419
+ "RB-25": 50730,
420
+ "Rahnella": 50616,
421
+ "Rahnella aceris": 50286,
422
+ "Rahnella sikkimica": 50280,
423
+ "Rahnella victoriana": 50389,
424
+ "Raoultella": 50354,
425
+ "Raoultella planticola": 50464,
426
+ "Raoultella terrigena": 50355,
427
+ "Riesia": 50724,
428
+ "S07-698": 50399,
429
+ "S1": 50617,
430
+ "S178-2": 50633,
431
+ "S2-A69": 50620,
432
+ "SCPM-O-B-7604": 50260,
433
+ "SE6-1": 50405,
434
+ "SGAir0282": 50579,
435
+ "SII": 50732,
436
+ "SK": 50564,
437
+ "SNU WT2": 50793,
438
+ "SOPE": 50320,
439
+ "SRCM103226": 50531,
440
+ "SS95": 50357,
441
+ "SWHEFF_49": 50752,
442
+ "Sakai substr. RIMD 0509952": 50780,
443
+ "Salmonella": 50515,
444
+ "Salmonella bongori": 50398,
445
+ "Salmonella enterica": 50258,
446
+ "Sample 167": 50400,
447
+ "Sb-24": 50641,
448
+ "Scandinavium": 50269,
449
+ "Scandinavium goeteborgense": 50544,
450
+ "Schneideria": 50448,
451
+ "Serratia": 50374,
452
+ "Serratia entomophila": 50661,
453
+ "Serratia ficaria": 50284,
454
+ "Serratia fonticola": 50549,
455
+ "Serratia inhibens": 50335,
456
+ "Serratia liquefaciens": 50581,
457
+ "Serratia nematodiphila": 50739,
458
+ "Serratia plymuthica": 50307,
459
+ "Serratia proteamaculans": 50662,
460
+ "Serratia quinivorans": 50761,
461
+ "Serratia rhizosphaerae": 50490,
462
+ "Serratia rubidaea": 50473,
463
+ "Serratia surfactantfaciens": 50526,
464
+ "Serratia symbiotica": 50763,
465
+ "Serratia ureilytica": 50353,
466
+ "Shigella": 50565,
467
+ "Shigella dysenteriae": 50456,
468
+ "Shigella flexneri": 50615,
469
+ "Shigella sonnei": 50410,
470
+ "Shimwellia": 50331,
471
+ "Shimwellia blattae": 50627,
472
+ "Siccibacter": 50539,
473
+ "Siccibacter colletis": 50577,
474
+ "Sodalis": 50657,
475
+ "Sodalis endosymbiont of Henestaris halophilus": 50709,
476
+ "Sodalis glossinidius": 50306,
477
+ "Sodalis praecaptivus": 50376,
478
+ "SyEd1": 50784,
479
+ "Symbiopectobacterium": 50749,
480
+ "Symbiopectobacterium purcellii": 50638,
481
+ "T6": 50296,
482
+ "TA9759": 50452,
483
+ "TBY01": 50271,
484
+ "THO-011": 50406,
485
+ "TTO1": 50329,
486
+ "Tachikawaea": 50434,
487
+ "Tatumella": 50718,
488
+ "Tatumella citrea": 50327,
489
+ "Trabulsiella": 50767,
490
+ "Trabulsiella odontotermitis": 50425,
491
+ "US": 50530,
492
+ "USDA": 50256,
493
+ "USDA-ARS-USMARC-60222": 50596,
494
+ "UwTKB": 50584,
495
+ "VKH10": 50632,
496
+ "W65": 50482,
497
+ "WCHECl-C4 = WCHECh050004": 50787,
498
+ "WCHKl090001": 50365,
499
+ "WCHPr000369": 50707,
500
+ "WPP14": 50764,
501
+ "Westeberhardia": 50336,
502
+ "Wigglesworthia": 50698,
503
+ "Wigglesworthia glossinidia": 50547,
504
+ "Winslowiella": 50458,
505
+ "Winslowiella toletana": 50772,
506
+ "XL123": 50283,
507
+ "XL95": 50358,
508
+ "Xenorhabdus": 50460,
509
+ "Xenorhabdus budapestensis": 50624,
510
+ "Xenorhabdus doucetiae": 50371,
511
+ "Xenorhabdus griffiniae": 50619,
512
+ "Xenorhabdus hominickii": 50352,
513
+ "Xenorhabdus nematophila": 50715,
514
+ "Xenorhabdus poinarii": 50317,
515
+ "YD25": 50437,
516
+ "YF8": 50735,
517
+ "YRA": 50480,
518
+ "YSD YN2": 50782,
519
+ "Y_sim_228": 50368,
520
+ "Yersinia": 50367,
521
+ "Yersinia aldovae": 50301,
522
+ "Yersinia alsatica": 50527,
523
+ "Yersinia canariae": 50765,
524
+ "Yersinia hibernica": 50611,
525
+ "Yersinia intermedia": 50789,
526
+ "Yersinia mollaretii": 50361,
527
+ "Yersinia pestis": 50597,
528
+ "Yersinia pseudotuberculosis": 50795,
529
+ "Yersinia rohdei": 50450,
530
+ "Yersinia ruckeri": 50308,
531
+ "Yersinia similis": 50459,
532
+ "Yersiniaceae": 50560,
533
+ "ZJ-FGZX1": 50607,
534
+ "ZN2": 50370,
535
+ "[Enterobacter] lignolyticus": 50631,
536
+ "[Pantoea] beijingensis": 50426,
537
+ "morsitans": 50686,
538
+ "obscurior": 50578,
539
+ "secondary endosymbiont of Ctenarytaina eucalypti": 50561,
540
+ "secondary endosymbiont of Heteropsylla cubana": 50592,
541
+ "secondary endosymbiont of Trabutina mannipara": 50583
542
+ }
merges.txt CHANGED
@@ -1,4 +1,4 @@
1
- #version: 0.2 - Trained by `huggingface/tokenizers`
2
  G C
3
  A A
4
  U U
 
1
+ #version: 0.2
2
  G C
3
  A A
4
  U U
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json CHANGED
The diff for this file is too large to render. See raw diff