KoichiYasuoka commited on
Commit
1bdd9b7
·
1 Parent(s): e9e8da5

initial release

Browse files
Files changed (8) hide show
  1. README.md +30 -0
  2. config.json +1832 -0
  3. maker.py +120 -0
  4. pytorch_model.bin +3 -0
  5. special_tokens_map.json +23 -0
  6. tokenizer.json +3 -0
  7. tokenizer_config.json +2063 -0
  8. ud.py +159 -0
README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - "th"
4
+ tags:
5
+ - "thai"
6
+ - "pos"
7
+ - "dependency-parsing"
8
+ base_model: scb10x/llama3.2-typhoon2-1b
9
+ datasets:
10
+ - "universal_dependencies"
11
+ license: "llama3.2"
12
+ pipeline_tag: "token-classification"
13
+ widget:
14
+ - text: "หลายหัวดีกว่าหัวเดียว"
15
+ ---
16
+
17
+ # llama3.2-typhoon2-1b-ud-embeds
18
+
19
+ ## Model Description
20
+
21
+ This is a LLaMA model pre-trained for POS-tagging and dependency-parsing, derived from [llama3.2-typhoon2-1b](https://huggingface.co/scb10x/llama3.2-typhoon2-1b) refined for [Thai Universal Dependency Treebank](https://github.com/nlp-chula/TUD).
22
+
23
+ ## How to Use
24
+
25
+ ```py
26
+ from transformers import pipeline
27
+ nlp=pipeline("universal-dependencies","KoichiYasuoka/llama3.2-typhoon2-1b-ud-embeds",trust_remote_code=True)
28
+ print(nlp("หลายหัวดีกว่าหัวเดียว"))
29
+ ```
30
+
config.json ADDED
@@ -0,0 +1,1832 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForTokenClassification"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 128000,
8
+ "custom_pipelines": {
9
+ "upos": {
10
+ "impl": "ud.BellmanFordTokenClassificationPipeline",
11
+ "pt": "AutoModelForTokenClassification"
12
+ },
13
+ "universal-dependencies": {
14
+ "impl": "ud.UniversalDependenciesPipeline",
15
+ "pt": "AutoModelForTokenClassification"
16
+ }
17
+ },
18
+ "eos_token_id": 128009,
19
+ "head_dim": 64,
20
+ "hidden_act": "silu",
21
+ "hidden_size": 2048,
22
+ "id2label": {
23
+ "0": "ADP",
24
+ "1": "ADP.",
25
+ "2": "ADP|Foreign=Yes|_",
26
+ "3": "ADP|Foreign=Yes|l-case",
27
+ "4": "ADP|NounType=Class|_",
28
+ "5": "ADP|NounType=Class|l-case",
29
+ "6": "ADP|Prefix=Yes|_",
30
+ "7": "ADP|Prefix=Yes|l-case",
31
+ "8": "ADP|Prefix=Yes|l-mark",
32
+ "9": "ADP|_",
33
+ "10": "ADP|l-acl",
34
+ "11": "ADP|l-advcl",
35
+ "12": "ADP|l-advmod",
36
+ "13": "ADP|l-case",
37
+ "14": "ADP|l-cc",
38
+ "15": "ADP|l-dep",
39
+ "16": "ADP|l-fixed",
40
+ "17": "ADP|l-flat",
41
+ "18": "ADP|l-mark",
42
+ "19": "ADP|l-nmod",
43
+ "20": "ADP|l-nsubj",
44
+ "21": "ADP|l-obl",
45
+ "22": "ADP|l-orphan",
46
+ "23": "ADP|r-acl",
47
+ "24": "ADP|r-advmod",
48
+ "25": "ADP|r-case",
49
+ "26": "ADP|r-compound",
50
+ "27": "ADP|r-conj",
51
+ "28": "ADP|r-fixed",
52
+ "29": "ADP|r-flat",
53
+ "30": "ADP|r-obl",
54
+ "31": "ADP|r-orphan",
55
+ "32": "ADP|root",
56
+ "33": "ADV",
57
+ "34": "ADV.",
58
+ "35": "ADV|Foreign=Yes|_",
59
+ "36": "ADV|Foreign=Yes|l-advmod",
60
+ "37": "ADV|Foreign=Yes|r-advmod",
61
+ "38": "ADV|NumType=Mult|_",
62
+ "39": "ADV|NumType=Mult|r-advmod",
63
+ "40": "ADV|PartType=Adv|_",
64
+ "41": "ADV|PartType=Adv|l-advmod",
65
+ "42": "ADV|PartType=Adv|l-mark",
66
+ "43": "ADV|PartType=Adv|r-advmod",
67
+ "44": "ADV|PartType=Enp|_",
68
+ "45": "ADV|PartType=Enp|l-advmod",
69
+ "46": "ADV|PartType=Enp|r-advmod",
70
+ "47": "ADV|PartType=Int|_",
71
+ "48": "ADV|PartType=Int|r-advmod",
72
+ "49": "ADV|PartType=Int|r-fixed",
73
+ "50": "ADV|Prefix=Yes|_",
74
+ "51": "ADV|Prefix=Yes|l-advmod",
75
+ "52": "ADV|Prefix=Yes|l-mark",
76
+ "53": "ADV|Prefix=Yes|r-advmod",
77
+ "54": "ADV|_",
78
+ "55": "ADV|l-acl",
79
+ "56": "ADV|l-advcl",
80
+ "57": "ADV|l-advmod",
81
+ "58": "ADV|l-aux",
82
+ "59": "ADV|l-case",
83
+ "60": "ADV|l-compound",
84
+ "61": "ADV|l-dep",
85
+ "62": "ADV|l-det",
86
+ "63": "ADV|l-discourse",
87
+ "64": "ADV|l-fixed",
88
+ "65": "ADV|l-mark",
89
+ "66": "ADV|l-orphan",
90
+ "67": "ADV|r-acl",
91
+ "68": "ADV|r-advcl",
92
+ "69": "ADV|r-advmod",
93
+ "70": "ADV|r-aux",
94
+ "71": "ADV|r-compound",
95
+ "72": "ADV|r-conj",
96
+ "73": "ADV|r-det",
97
+ "74": "ADV|r-fixed",
98
+ "75": "ADV|r-flat",
99
+ "76": "ADV|r-mark",
100
+ "77": "ADV|r-nmod",
101
+ "78": "ADV|r-obj",
102
+ "79": "ADV|r-orphan",
103
+ "80": "ADV|r-xcomp",
104
+ "81": "ADV|root",
105
+ "82": "AUX",
106
+ "83": "AUX.",
107
+ "84": "AUX|Foreign=Yes|_",
108
+ "85": "AUX|Foreign=Yes|l-aux",
109
+ "86": "AUX|NounType=Class|_",
110
+ "87": "AUX|NounType=Class|r-appos",
111
+ "88": "AUX|Prefix=Yes|_",
112
+ "89": "AUX|Prefix=Yes|l-aux",
113
+ "90": "AUX|Prefix=Yes|r-aux",
114
+ "91": "AUX|VerbType=Cop|_",
115
+ "92": "AUX|VerbType=Cop|l-acl",
116
+ "93": "AUX|VerbType=Cop|l-advcl",
117
+ "94": "AUX|VerbType=Cop|l-aux",
118
+ "95": "AUX|VerbType=Cop|l-cop",
119
+ "96": "AUX|VerbType=Cop|r-acl",
120
+ "97": "AUX|VerbType=Cop|r-advcl",
121
+ "98": "AUX|VerbType=Cop|r-aux",
122
+ "99": "AUX|VerbType=Cop|r-conj",
123
+ "100": "AUX|VerbType=Cop|r-mark",
124
+ "101": "AUX|VerbType=Cop|root",
125
+ "102": "AUX|_",
126
+ "103": "AUX|l-advmod",
127
+ "104": "AUX|l-aux",
128
+ "105": "AUX|l-cop",
129
+ "106": "AUX|l-mark",
130
+ "107": "AUX|r-acl",
131
+ "108": "AUX|r-advmod",
132
+ "109": "AUX|r-aux",
133
+ "110": "AUX|r-ccomp",
134
+ "111": "AUX|r-clf",
135
+ "112": "AUX|r-compound",
136
+ "113": "AUX|r-conj",
137
+ "114": "AUX|r-fixed",
138
+ "115": "AUX|root",
139
+ "116": "B-ADP",
140
+ "117": "B-ADP.",
141
+ "118": "B-ADV",
142
+ "119": "B-ADV.",
143
+ "120": "B-AUX",
144
+ "121": "B-AUX.",
145
+ "122": "B-CCONJ",
146
+ "123": "B-CCONJ.",
147
+ "124": "B-DET",
148
+ "125": "B-DET.",
149
+ "126": "B-NOUN",
150
+ "127": "B-NOUN.",
151
+ "128": "B-NUM",
152
+ "129": "B-NUM.",
153
+ "130": "B-PART",
154
+ "131": "B-PART.",
155
+ "132": "B-PRON",
156
+ "133": "B-PRON.",
157
+ "134": "B-PROPN",
158
+ "135": "B-PROPN.",
159
+ "136": "B-PUNCT",
160
+ "137": "B-PUNCT.",
161
+ "138": "B-SCONJ",
162
+ "139": "B-SCONJ.",
163
+ "140": "B-SYM",
164
+ "141": "B-SYM.",
165
+ "142": "B-VERB",
166
+ "143": "B-VERB.",
167
+ "144": "CCONJ",
168
+ "145": "CCONJ.",
169
+ "146": "CCONJ|Foreign=Yes|_",
170
+ "147": "CCONJ|Foreign=Yes|l-cc",
171
+ "148": "CCONJ|PronType=Prs|_",
172
+ "149": "CCONJ|PronType=Prs|l-cc",
173
+ "150": "CCONJ|_",
174
+ "151": "CCONJ|l-advmod",
175
+ "152": "CCONJ|l-case",
176
+ "153": "CCONJ|l-cc",
177
+ "154": "CCONJ|l-conj",
178
+ "155": "CCONJ|l-discourse",
179
+ "156": "CCONJ|l-fixed",
180
+ "157": "CCONJ|l-flat",
181
+ "158": "CCONJ|l-mark",
182
+ "159": "CCONJ|l-nsubj",
183
+ "160": "CCONJ|l-obj",
184
+ "161": "CCONJ|l-orphan",
185
+ "162": "CCONJ|r-cc",
186
+ "163": "CCONJ|r-compound",
187
+ "164": "CCONJ|r-fixed",
188
+ "165": "CCONJ|r-mark",
189
+ "166": "DET",
190
+ "167": "DET.",
191
+ "168": "DET|NumType=Mult|_",
192
+ "169": "DET|NumType=Mult|l-det",
193
+ "170": "DET|PartType=Emp|_",
194
+ "171": "DET|PartType=Emp|r-det",
195
+ "172": "DET|PartType=Int|_",
196
+ "173": "DET|PartType=Int|r-det",
197
+ "174": "DET|_",
198
+ "175": "DET|l-compound",
199
+ "176": "DET|l-det",
200
+ "177": "DET|l-discourse",
201
+ "178": "DET|l-nsubj",
202
+ "179": "DET|l-obl",
203
+ "180": "DET|l-orphan",
204
+ "181": "DET|r-advmod",
205
+ "182": "DET|r-compound",
206
+ "183": "DET|r-conj",
207
+ "184": "DET|r-dep",
208
+ "185": "DET|r-det",
209
+ "186": "DET|r-fixed",
210
+ "187": "DET|r-flat",
211
+ "188": "DET|r-list",
212
+ "189": "DET|r-nmod",
213
+ "190": "DET|r-nummod",
214
+ "191": "DET|r-obl",
215
+ "192": "DET|r-orphan",
216
+ "193": "I-ADP",
217
+ "194": "I-ADP.",
218
+ "195": "I-ADV",
219
+ "196": "I-ADV.",
220
+ "197": "I-AUX",
221
+ "198": "I-AUX.",
222
+ "199": "I-CCONJ",
223
+ "200": "I-CCONJ.",
224
+ "201": "I-DET",
225
+ "202": "I-DET.",
226
+ "203": "I-NOUN",
227
+ "204": "I-NOUN.",
228
+ "205": "I-NUM",
229
+ "206": "I-NUM.",
230
+ "207": "I-PART",
231
+ "208": "I-PART.",
232
+ "209": "I-PRON",
233
+ "210": "I-PRON.",
234
+ "211": "I-PROPN",
235
+ "212": "I-PROPN.",
236
+ "213": "I-PUNCT",
237
+ "214": "I-PUNCT.",
238
+ "215": "I-SCONJ",
239
+ "216": "I-SCONJ.",
240
+ "217": "I-SYM",
241
+ "218": "I-SYM.",
242
+ "219": "I-VERB",
243
+ "220": "I-VERB.",
244
+ "221": "NOUN",
245
+ "222": "NOUN.",
246
+ "223": "NOUN|Abbr=Yes|Foreign=Yes|_",
247
+ "224": "NOUN|Abbr=Yes|Foreign=Yes|r-nmod",
248
+ "225": "NOUN|Abbr=Yes|Prefix=Yes|_",
249
+ "226": "NOUN|Abbr=Yes|Prefix=Yes|l-flat",
250
+ "227": "NOUN|Abbr=Yes|_",
251
+ "228": "NOUN|Abbr=Yes|l-flat",
252
+ "229": "NOUN|Abbr=Yes|l-nmod",
253
+ "230": "NOUN|Abbr=Yes|l-nsubj",
254
+ "231": "NOUN|Abbr=Yes|l-obl",
255
+ "232": "NOUN|Abbr=Yes|r-acl",
256
+ "233": "NOUN|Abbr=Yes|r-appos",
257
+ "234": "NOUN|Abbr=Yes|r-clf",
258
+ "235": "NOUN|Abbr=Yes|r-conj",
259
+ "236": "NOUN|Abbr=Yes|r-fixed",
260
+ "237": "NOUN|Abbr=Yes|r-flat",
261
+ "238": "NOUN|Abbr=Yes|r-nmod",
262
+ "239": "NOUN|Abbr=Yes|r-obj",
263
+ "240": "NOUN|Abbr=Yes|r-obl",
264
+ "241": "NOUN|Foreign=Yes|NounType=Class|_",
265
+ "242": "NOUN|Foreign=Yes|NounType=Class|r-clf",
266
+ "243": "NOUN|Foreign=Yes|NounType=Class|r-obj",
267
+ "244": "NOUN|Foreign=Yes|Prefix=Yes|_",
268
+ "245": "NOUN|Foreign=Yes|Prefix=Yes|l-flat",
269
+ "246": "NOUN|Foreign=Yes|Prefix=Yes|r-appos",
270
+ "247": "NOUN|Foreign=Yes|_",
271
+ "248": "NOUN|Foreign=Yes|l-dislocated",
272
+ "249": "NOUN|Foreign=Yes|l-flat",
273
+ "250": "NOUN|Foreign=Yes|l-nmod",
274
+ "251": "NOUN|Foreign=Yes|l-nsubj",
275
+ "252": "NOUN|Foreign=Yes|l-obl",
276
+ "253": "NOUN|Foreign=Yes|r-acl",
277
+ "254": "NOUN|Foreign=Yes|r-advcl",
278
+ "255": "NOUN|Foreign=Yes|r-advmod",
279
+ "256": "NOUN|Foreign=Yes|r-appos",
280
+ "257": "NOUN|Foreign=Yes|r-ccomp",
281
+ "258": "NOUN|Foreign=Yes|r-clf",
282
+ "259": "NOUN|Foreign=Yes|r-compound",
283
+ "260": "NOUN|Foreign=Yes|r-conj",
284
+ "261": "NOUN|Foreign=Yes|r-flat",
285
+ "262": "NOUN|Foreign=Yes|r-iobj",
286
+ "263": "NOUN|Foreign=Yes|r-list",
287
+ "264": "NOUN|Foreign=Yes|r-nmod",
288
+ "265": "NOUN|Foreign=Yes|r-obj",
289
+ "266": "NOUN|Foreign=Yes|r-obl",
290
+ "267": "NOUN|Foreign=Yes|r-xcomp",
291
+ "268": "NOUN|Foreign=Yes|root",
292
+ "269": "NOUN|NameType=Com|_",
293
+ "270": "NOUN|NameType=Com|r-nmod",
294
+ "271": "NOUN|NameType=Geo|_",
295
+ "272": "NOUN|NameType=Geo|l-nsubj",
296
+ "273": "NOUN|NameType=Geo|r-nmod",
297
+ "274": "NOUN|NameType=Geo|r-obj",
298
+ "275": "NOUN|NameType=Nat|_",
299
+ "276": "NOUN|NameType=Nat|r-nmod",
300
+ "277": "NOUN|NameType=Oth|_",
301
+ "278": "NOUN|NameType=Oth|l-nsubj",
302
+ "279": "NOUN|NameType=Oth|r-conj",
303
+ "280": "NOUN|NameType=Oth|r-flat",
304
+ "281": "NOUN|NameType=Oth|r-nmod",
305
+ "282": "NOUN|NameType=Pro|_",
306
+ "283": "NOUN|NameType=Pro|r-nmod",
307
+ "284": "NOUN|NameType=Prs|_",
308
+ "285": "NOUN|NameType=Prs|l-nsubj",
309
+ "286": "NOUN|NameType=Prs|r-nmod",
310
+ "287": "NOUN|NounType=Class|Prefix=Yes|_",
311
+ "288": "NOUN|NounType=Class|Prefix=Yes|l-advcl",
312
+ "289": "NOUN|NounType=Class|Prefix=Yes|l-advmod",
313
+ "290": "NOUN|NounType=Class|Prefix=Yes|l-mark",
314
+ "291": "NOUN|NounType=Class|Prefix=Yes|l-nmod",
315
+ "292": "NOUN|NounType=Class|Prefix=Yes|l-nsubj",
316
+ "293": "NOUN|NounType=Class|Prefix=Yes|r-advcl",
317
+ "294": "NOUN|NounType=Class|Prefix=Yes|r-clf",
318
+ "295": "NOUN|NounType=Class|Prefix=Yes|r-nmod",
319
+ "296": "NOUN|NounType=Class|Prefix=Yes|r-obj",
320
+ "297": "NOUN|NounType=Class|_",
321
+ "298": "NOUN|NounType=Class|l-advcl",
322
+ "299": "NOUN|NounType=Class|l-advmod",
323
+ "300": "NOUN|NounType=Class|l-clf",
324
+ "301": "NOUN|NounType=Class|l-dislocated",
325
+ "302": "NOUN|NounType=Class|l-nmod",
326
+ "303": "NOUN|NounType=Class|l-nsubj",
327
+ "304": "NOUN|NounType=Class|l-obj",
328
+ "305": "NOUN|NounType=Class|l-obl",
329
+ "306": "NOUN|NounType=Class|r-acl",
330
+ "307": "NOUN|NounType=Class|r-advcl",
331
+ "308": "NOUN|NounType=Class|r-advmod",
332
+ "309": "NOUN|NounType=Class|r-appos",
333
+ "310": "NOUN|NounType=Class|r-cc",
334
+ "311": "NOUN|NounType=Class|r-ccomp",
335
+ "312": "NOUN|NounType=Class|r-clf",
336
+ "313": "NOUN|NounType=Class|r-compound",
337
+ "314": "NOUN|NounType=Class|r-conj",
338
+ "315": "NOUN|NounType=Class|r-dislocated",
339
+ "316": "NOUN|NounType=Class|r-fixed",
340
+ "317": "NOUN|NounType=Class|r-flat",
341
+ "318": "NOUN|NounType=Class|r-iobj",
342
+ "319": "NOUN|NounType=Class|r-list",
343
+ "320": "NOUN|NounType=Class|r-nmod",
344
+ "321": "NOUN|NounType=Class|r-nummod",
345
+ "322": "NOUN|NounType=Class|r-obj",
346
+ "323": "NOUN|NounType=Class|r-obl",
347
+ "324": "NOUN|NounType=Class|r-orphan",
348
+ "325": "NOUN|NounType=Class|r-xcomp",
349
+ "326": "NOUN|NounType=Class|root",
350
+ "327": "NOUN|NumType=Mult|_",
351
+ "328": "NOUN|NumType=Mult|r-advcl",
352
+ "329": "NOUN|NumType=Mult|r-nmod",
353
+ "330": "NOUN|NumType=Mult|r-obj",
354
+ "331": "NOUN|PartType=Enp|_",
355
+ "332": "NOUN|PartType=Enp|r-obj",
356
+ "333": "NOUN|PartType=Enp|r-obl",
357
+ "334": "NOUN|PartType=Int|_",
358
+ "335": "NOUN|PartType=Int|r-obj",
359
+ "336": "NOUN|PartType=Res|_",
360
+ "337": "NOUN|PartType=Res|r-nmod",
361
+ "338": "NOUN|PartType=Res|r-obj",
362
+ "339": "NOUN|Prefix=Yes|_",
363
+ "340": "NOUN|Prefix=Yes|l-acl",
364
+ "341": "NOUN|Prefix=Yes|l-advcl",
365
+ "342": "NOUN|Prefix=Yes|l-clf",
366
+ "343": "NOUN|Prefix=Yes|l-csubj",
367
+ "344": "NOUN|Prefix=Yes|l-dislocated",
368
+ "345": "NOUN|Prefix=Yes|l-flat",
369
+ "346": "NOUN|Prefix=Yes|l-nmod",
370
+ "347": "NOUN|Prefix=Yes|l-nsubj",
371
+ "348": "NOUN|Prefix=Yes|l-obj",
372
+ "349": "NOUN|Prefix=Yes|l-obl",
373
+ "350": "NOUN|Prefix=Yes|r-acl",
374
+ "351": "NOUN|Prefix=Yes|r-advcl",
375
+ "352": "NOUN|Prefix=Yes|r-advmod",
376
+ "353": "NOUN|Prefix=Yes|r-appos",
377
+ "354": "NOUN|Prefix=Yes|r-case",
378
+ "355": "NOUN|Prefix=Yes|r-cc",
379
+ "356": "NOUN|Prefix=Yes|r-ccomp",
380
+ "357": "NOUN|Prefix=Yes|r-clf",
381
+ "358": "NOUN|Prefix=Yes|r-compound",
382
+ "359": "NOUN|Prefix=Yes|r-conj",
383
+ "360": "NOUN|Prefix=Yes|r-dislocated",
384
+ "361": "NOUN|Prefix=Yes|r-fixed",
385
+ "362": "NOUN|Prefix=Yes|r-flat",
386
+ "363": "NOUN|Prefix=Yes|r-iobj",
387
+ "364": "NOUN|Prefix=Yes|r-list",
388
+ "365": "NOUN|Prefix=Yes|r-nmod",
389
+ "366": "NOUN|Prefix=Yes|r-nummod",
390
+ "367": "NOUN|Prefix=Yes|r-obj",
391
+ "368": "NOUN|Prefix=Yes|r-obl",
392
+ "369": "NOUN|Prefix=Yes|r-orphan",
393
+ "370": "NOUN|Prefix=Yes|r-xcomp",
394
+ "371": "NOUN|Prefix=Yes|root",
395
+ "372": "NOUN|_",
396
+ "373": "NOUN|l-acl",
397
+ "374": "NOUN|l-advcl",
398
+ "375": "NOUN|l-advmod",
399
+ "376": "NOUN|l-case",
400
+ "377": "NOUN|l-ccomp",
401
+ "378": "NOUN|l-compound",
402
+ "379": "NOUN|l-csubj",
403
+ "380": "NOUN|l-discourse",
404
+ "381": "NOUN|l-dislocated",
405
+ "382": "NOUN|l-expl",
406
+ "383": "NOUN|l-flat",
407
+ "384": "NOUN|l-iobj",
408
+ "385": "NOUN|l-mark",
409
+ "386": "NOUN|l-nmod",
410
+ "387": "NOUN|l-nsubj",
411
+ "388": "NOUN|l-nummod",
412
+ "389": "NOUN|l-obj",
413
+ "390": "NOUN|l-obl",
414
+ "391": "NOUN|l-orphan",
415
+ "392": "NOUN|l-vocative",
416
+ "393": "NOUN|r-acl",
417
+ "394": "NOUN|r-advcl",
418
+ "395": "NOUN|r-advmod",
419
+ "396": "NOUN|r-appos",
420
+ "397": "NOUN|r-case",
421
+ "398": "NOUN|r-cc",
422
+ "399": "NOUN|r-ccomp",
423
+ "400": "NOUN|r-clf",
424
+ "401": "NOUN|r-compound",
425
+ "402": "NOUN|r-conj",
426
+ "403": "NOUN|r-cop",
427
+ "404": "NOUN|r-discourse",
428
+ "405": "NOUN|r-dislocated",
429
+ "406": "NOUN|r-fixed",
430
+ "407": "NOUN|r-flat",
431
+ "408": "NOUN|r-iobj",
432
+ "409": "NOUN|r-list",
433
+ "410": "NOUN|r-mark",
434
+ "411": "NOUN|r-nmod",
435
+ "412": "NOUN|r-nsubj",
436
+ "413": "NOUN|r-nummod",
437
+ "414": "NOUN|r-obj",
438
+ "415": "NOUN|r-obl",
439
+ "416": "NOUN|r-orphan",
440
+ "417": "NOUN|r-parataxis",
441
+ "418": "NOUN|r-xcomp",
442
+ "419": "NOUN|root",
443
+ "420": "NUM",
444
+ "421": "NUM.",
445
+ "422": "NUM|Abbr=Yes|_",
446
+ "423": "NUM|Abbr=Yes|r-flat",
447
+ "424": "NUM|Abbr=Yes|r-nummod",
448
+ "425": "NUM|Abbr=Yes|r-obj",
449
+ "426": "NUM|Foreign=Yes|_",
450
+ "427": "NUM|Foreign=Yes|r-clf",
451
+ "428": "NUM|NumType=Mult|_",
452
+ "429": "NUM|NumType=Mult|l-advmod",
453
+ "430": "NUM|NumType=Mult|l-nummod",
454
+ "431": "NUM|NumType=Mult|r-advmod",
455
+ "432": "NUM|Prefix=Yes|_",
456
+ "433": "NUM|Prefix=Yes|l-nummod",
457
+ "434": "NUM|_",
458
+ "435": "NUM|l-advcl",
459
+ "436": "NUM|l-advmod",
460
+ "437": "NUM|l-case",
461
+ "438": "NUM|l-clf",
462
+ "439": "NUM|l-dep",
463
+ "440": "NUM|l-flat",
464
+ "441": "NUM|l-nmod",
465
+ "442": "NUM|l-nsubj",
466
+ "443": "NUM|l-nummod",
467
+ "444": "NUM|l-obl",
468
+ "445": "NUM|r-acl",
469
+ "446": "NUM|r-advmod",
470
+ "447": "NUM|r-appos",
471
+ "448": "NUM|r-ccomp",
472
+ "449": "NUM|r-compound",
473
+ "450": "NUM|r-conj",
474
+ "451": "NUM|r-det",
475
+ "452": "NUM|r-fixed",
476
+ "453": "NUM|r-flat",
477
+ "454": "NUM|r-iobj",
478
+ "455": "NUM|r-nmod",
479
+ "456": "NUM|r-nummod",
480
+ "457": "NUM|r-obj",
481
+ "458": "NUM|r-obl",
482
+ "459": "NUM|root",
483
+ "460": "PART",
484
+ "461": "PART.",
485
+ "462": "PART|NameType=Oth|_",
486
+ "463": "PART|NameType=Oth|l-advmod",
487
+ "464": "PART|NounType=Class|PartType=Emp|Prefix=Yes|_",
488
+ "465": "PART|NounType=Class|PartType=Emp|Prefix=Yes|l-mark",
489
+ "466": "PART|NounType=Class|PartType=Emp|_",
490
+ "467": "PART|NounType=Class|PartType=Emp|l-mark",
491
+ "468": "PART|NounType=Class|Prefix=Yes|_",
492
+ "469": "PART|NounType=Class|Prefix=Yes|l-mark",
493
+ "470": "PART|NumType=Mult|PartType=Emp|_",
494
+ "471": "PART|NumType=Mult|PartType=Emp|l-mark",
495
+ "472": "PART|PartType=Adj|_",
496
+ "473": "PART|PartType=Adj|l-mark",
497
+ "474": "PART|PartType=Adj|l-orphan",
498
+ "475": "PART|PartType=Adj|r-acl",
499
+ "476": "PART|PartType=Adj|r-compound",
500
+ "477": "PART|PartType=Adj|r-nmod",
501
+ "478": "PART|PartType=Adv|_",
502
+ "479": "PART|PartType=Adv|l-advmod",
503
+ "480": "PART|PartType=Adv|l-mark",
504
+ "481": "PART|PartType=Adv|r-advmod",
505
+ "482": "PART|PartType=Emp|Prefix=Yes|_",
506
+ "483": "PART|PartType=Emp|Prefix=Yes|l-advmod",
507
+ "484": "PART|PartType=Emp|Prefix=Yes|l-aux",
508
+ "485": "PART|PartType=Emp|Prefix=Yes|l-mark",
509
+ "486": "PART|PartType=Emp|_",
510
+ "487": "PART|PartType=Emp|l-advmod",
511
+ "488": "PART|PartType=Emp|l-case",
512
+ "489": "PART|PartType=Emp|l-discourse",
513
+ "490": "PART|PartType=Emp|l-mark",
514
+ "491": "PART|PartType=Emp|r-acl",
515
+ "492": "PART|PartType=Emp|r-advmod",
516
+ "493": "PART|PartType=Emp|r-aux",
517
+ "494": "PART|PartType=Emp|r-compound",
518
+ "495": "PART|PartType=Emp|r-det",
519
+ "496": "PART|PartType=Emp|r-fixed",
520
+ "497": "PART|PartType=Emp|r-mark",
521
+ "498": "PART|PartType=Emp|r-nmod",
522
+ "499": "PART|PartType=Enp|_",
523
+ "500": "PART|PartType=Enp|l-discourse",
524
+ "501": "PART|PartType=Enp|r-acl",
525
+ "502": "PART|PartType=Enp|r-advmod",
526
+ "503": "PART|PartType=Enp|r-compound",
527
+ "504": "PART|PartType=Enp|r-dep",
528
+ "505": "PART|PartType=Enp|r-det",
529
+ "506": "PART|PartType=Enp|r-discourse",
530
+ "507": "PART|PartType=Enp|r-fixed",
531
+ "508": "PART|PartType=Enp|r-obl",
532
+ "509": "PART|PartType=Int|_",
533
+ "510": "PART|PartType=Int|l-advmod",
534
+ "511": "PART|PartType=Int|l-mark",
535
+ "512": "PART|PartType=Int|r-acl",
536
+ "513": "PART|PartType=Int|r-advmod",
537
+ "514": "PART|PartType=Int|r-dep",
538
+ "515": "PART|PartType=Int|r-discourse",
539
+ "516": "PART|PartType=Int|r-nmod",
540
+ "517": "PART|PartType=Int|r-obj",
541
+ "518": "PART|PartType=Int|r-obl",
542
+ "519": "PART|PartType=Neg|_",
543
+ "520": "PART|PartType=Neg|l-advcl",
544
+ "521": "PART|PartType=Neg|l-advmod",
545
+ "522": "PART|PartType=Neg|l-aux",
546
+ "523": "PART|PartType=Neg|l-mark",
547
+ "524": "PART|PartType=Neg|r-acl",
548
+ "525": "PART|PartType=Neg|r-advmod",
549
+ "526": "PART|PartType=Neg|r-fixed",
550
+ "527": "PART|PartType=Res|_",
551
+ "528": "PART|PartType=Res|r-advmod",
552
+ "529": "PART|PartType=Res|r-discourse",
553
+ "530": "PART|PartType=Res|r-fixed",
554
+ "531": "PART|Prefix=Yes|_",
555
+ "532": "PART|Prefix=Yes|l-advmod",
556
+ "533": "PART|Prefix=Yes|l-aux",
557
+ "534": "PART|Prefix=Yes|l-mark",
558
+ "535": "PART|Prefix=Yes|r-acl",
559
+ "536": "PART|Prefix=Yes|r-nmod",
560
+ "537": "PART|_",
561
+ "538": "PART|l-advmod",
562
+ "539": "PART|l-discourse",
563
+ "540": "PART|l-mark",
564
+ "541": "PART|l-nsubj",
565
+ "542": "PART|r-acl",
566
+ "543": "PART|r-advmod",
567
+ "544": "PART|r-discourse",
568
+ "545": "PART|r-fixed",
569
+ "546": "PART|r-mark",
570
+ "547": "PART|r-obj",
571
+ "548": "PRON",
572
+ "549": "PRON.",
573
+ "550": "PRON|NounType=Class|_",
574
+ "551": "PRON|NounType=Class|r-clf",
575
+ "552": "PRON|PronType=Prs|_",
576
+ "553": "PRON|PronType=Prs|l-advmod",
577
+ "554": "PRON|PronType=Prs|l-expl",
578
+ "555": "PRON|PronType=Prs|l-nsubj",
579
+ "556": "PRON|PronType=Prs|l-obj",
580
+ "557": "PRON|PronType=Prs|l-obl",
581
+ "558": "PRON|PronType=Prs|r-advcl",
582
+ "559": "PRON|PronType=Prs|r-advmod",
583
+ "560": "PRON|PronType=Prs|r-ccomp",
584
+ "561": "PRON|PronType=Prs|r-clf",
585
+ "562": "PRON|PronType=Prs|r-conj",
586
+ "563": "PRON|PronType=Prs|r-nmod",
587
+ "564": "PRON|PronType=Prs|r-nsubj",
588
+ "565": "PRON|PronType=Prs|r-obj",
589
+ "566": "PRON|PronType=Prs|r-obl",
590
+ "567": "PRON|PronType=Prs|root",
591
+ "568": "PRON|PronType=Rcp|_",
592
+ "569": "PRON|PronType=Rcp|r-advmod",
593
+ "570": "PRON|PronType=Rcp|r-iobj",
594
+ "571": "PRON|PronType=Rcp|r-nmod",
595
+ "572": "PRON|PronType=Rcp|r-obj",
596
+ "573": "PRON|PronType=Rcp|r-obl",
597
+ "574": "PRON|_",
598
+ "575": "PRON|l-advcl",
599
+ "576": "PRON|l-advmod",
600
+ "577": "PRON|l-compound",
601
+ "578": "PRON|l-csubj",
602
+ "579": "PRON|l-dislocated",
603
+ "580": "PRON|l-expl",
604
+ "581": "PRON|l-iobj",
605
+ "582": "PRON|l-mark",
606
+ "583": "PRON|l-nsubj",
607
+ "584": "PRON|l-obj",
608
+ "585": "PRON|l-obl",
609
+ "586": "PRON|r-acl",
610
+ "587": "PRON|r-advmod",
611
+ "588": "PRON|r-appos",
612
+ "589": "PRON|r-ccomp",
613
+ "590": "PRON|r-compound",
614
+ "591": "PRON|r-conj",
615
+ "592": "PRON|r-det",
616
+ "593": "PRON|r-discourse",
617
+ "594": "PRON|r-fixed",
618
+ "595": "PRON|r-flat",
619
+ "596": "PRON|r-iobj",
620
+ "597": "PRON|r-nmod",
621
+ "598": "PRON|r-nsubj",
622
+ "599": "PRON|r-obj",
623
+ "600": "PRON|r-obl",
624
+ "601": "PROPN",
625
+ "602": "PROPN.",
626
+ "603": "PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth|_",
627
+ "604": "PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth|r-obj",
628
+ "605": "PROPN|Abbr=Yes|NameType=Com|_",
629
+ "606": "PROPN|Abbr=Yes|NameType=Com|r-advmod",
630
+ "607": "PROPN|Abbr=Yes|NameType=Com|r-nmod",
631
+ "608": "PROPN|Abbr=Yes|_",
632
+ "609": "PROPN|Abbr=Yes|l-nmod",
633
+ "610": "PROPN|Abbr=Yes|l-nsubj",
634
+ "611": "PROPN|Abbr=Yes|r-nmod",
635
+ "612": "PROPN|Foreign=Yes|NameType=Com|_",
636
+ "613": "PROPN|Foreign=Yes|NameType=Com|l-nsubj",
637
+ "614": "PROPN|Foreign=Yes|NameType=Com|r-list",
638
+ "615": "PROPN|Foreign=Yes|NameType=Com|r-nmod",
639
+ "616": "PROPN|Foreign=Yes|NameType=Com|r-obl",
640
+ "617": "PROPN|Foreign=Yes|NameType=Geo|_",
641
+ "618": "PROPN|Foreign=Yes|NameType=Geo|r-obj",
642
+ "619": "PROPN|Foreign=Yes|NameType=Geo|r-obl",
643
+ "620": "PROPN|Foreign=Yes|NameType=Giv|_",
644
+ "621": "PROPN|Foreign=Yes|NameType=Giv|l-nsubj",
645
+ "622": "PROPN|Foreign=Yes|NameType=Oth|_",
646
+ "623": "PROPN|Foreign=Yes|NameType=Oth|r-conj",
647
+ "624": "PROPN|Foreign=Yes|NameType=Oth|r-flat",
648
+ "625": "PROPN|Foreign=Yes|NameType=Oth|r-nmod",
649
+ "626": "PROPN|Foreign=Yes|NameType=Prs|_",
650
+ "627": "PROPN|Foreign=Yes|NameType=Prs|l-flat",
651
+ "628": "PROPN|Foreign=Yes|NameType=Prs|l-nsubj",
652
+ "629": "PROPN|Foreign=Yes|NameType=Prs|r-conj",
653
+ "630": "PROPN|Foreign=Yes|NameType=Prs|r-flat",
654
+ "631": "PROPN|Foreign=Yes|NameType=Prs|r-nmod",
655
+ "632": "PROPN|Foreign=Yes|NameType=Prs|r-obj",
656
+ "633": "PROPN|Foreign=Yes|NameType=Prs|r-obl",
657
+ "634": "PROPN|Foreign=Yes|NameType=Sur|_",
658
+ "635": "PROPN|Foreign=Yes|NameType=Sur|r-flat",
659
+ "636": "PROPN|Foreign=Yes|_",
660
+ "637": "PROPN|Foreign=Yes|l-flat",
661
+ "638": "PROPN|Foreign=Yes|l-nmod",
662
+ "639": "PROPN|Foreign=Yes|l-nsubj",
663
+ "640": "PROPN|Foreign=Yes|l-obl",
664
+ "641": "PROPN|Foreign=Yes|r-appos",
665
+ "642": "PROPN|Foreign=Yes|r-ccomp",
666
+ "643": "PROPN|Foreign=Yes|r-compound",
667
+ "644": "PROPN|Foreign=Yes|r-conj",
668
+ "645": "PROPN|Foreign=Yes|r-flat",
669
+ "646": "PROPN|Foreign=Yes|r-iobj",
670
+ "647": "PROPN|Foreign=Yes|r-list",
671
+ "648": "PROPN|Foreign=Yes|r-nmod",
672
+ "649": "PROPN|Foreign=Yes|r-nsubj",
673
+ "650": "PROPN|Foreign=Yes|r-obj",
674
+ "651": "PROPN|Foreign=Yes|r-obl",
675
+ "652": "PROPN|Foreign=Yes|root",
676
+ "653": "PROPN|NameType=Com|_",
677
+ "654": "PROPN|NameType=Com|l-nsubj",
678
+ "655": "PROPN|NameType=Com|l-obl",
679
+ "656": "PROPN|NameType=Com|r-appos",
680
+ "657": "PROPN|NameType=Com|r-conj",
681
+ "658": "PROPN|NameType=Com|r-flat",
682
+ "659": "PROPN|NameType=Com|r-list",
683
+ "660": "PROPN|NameType=Com|r-nmod",
684
+ "661": "PROPN|NameType=Com|r-nsubj",
685
+ "662": "PROPN|NameType=Com|r-obj",
686
+ "663": "PROPN|NameType=Com|r-obl",
687
+ "664": "PROPN|NameType=Geo|_",
688
+ "665": "PROPN|NameType=Geo|l-nsubj",
689
+ "666": "PROPN|NameType=Geo|l-obl",
690
+ "667": "PROPN|NameType=Geo|r-compound",
691
+ "668": "PROPN|NameType=Geo|r-conj",
692
+ "669": "PROPN|NameType=Geo|r-flat",
693
+ "670": "PROPN|NameType=Geo|r-list",
694
+ "671": "PROPN|NameType=Geo|r-nmod",
695
+ "672": "PROPN|NameType=Geo|r-nsubj",
696
+ "673": "PROPN|NameType=Geo|r-nummod",
697
+ "674": "PROPN|NameType=Geo|r-obj",
698
+ "675": "PROPN|NameType=Geo|r-obl",
699
+ "676": "PROPN|NameType=Geo|root",
700
+ "677": "PROPN|NameType=Giv|_",
701
+ "678": "PROPN|NameType=Giv|l-dislocated",
702
+ "679": "PROPN|NameType=Giv|l-nsubj",
703
+ "680": "PROPN|NameType=Giv|l-obl",
704
+ "681": "PROPN|NameType=Giv|r-acl",
705
+ "682": "PROPN|NameType=Giv|r-appos",
706
+ "683": "PROPN|NameType=Giv|r-ccomp",
707
+ "684": "PROPN|NameType=Giv|r-conj",
708
+ "685": "PROPN|NameType=Giv|r-flat",
709
+ "686": "PROPN|NameType=Giv|r-list",
710
+ "687": "PROPN|NameType=Giv|r-nmod",
711
+ "688": "PROPN|NameType=Giv|r-nsubj",
712
+ "689": "PROPN|NameType=Giv|r-obj",
713
+ "690": "PROPN|NameType=Giv|r-obl",
714
+ "691": "PROPN|NameType=Giv|root",
715
+ "692": "PROPN|NameType=Nat|_",
716
+ "693": "PROPN|NameType=Nat|l-csubj",
717
+ "694": "PROPN|NameType=Nat|l-nsubj",
718
+ "695": "PROPN|NameType=Nat|l-obl",
719
+ "696": "PROPN|NameType=Nat|r-acl",
720
+ "697": "PROPN|NameType=Nat|r-appos",
721
+ "698": "PROPN|NameType=Nat|r-compound",
722
+ "699": "PROPN|NameType=Nat|r-conj",
723
+ "700": "PROPN|NameType=Nat|r-flat",
724
+ "701": "PROPN|NameType=Nat|r-list",
725
+ "702": "PROPN|NameType=Nat|r-nmod",
726
+ "703": "PROPN|NameType=Nat|r-nummod",
727
+ "704": "PROPN|NameType=Nat|r-obj",
728
+ "705": "PROPN|NameType=Nat|r-obl",
729
+ "706": "PROPN|NameType=Oth|_",
730
+ "707": "PROPN|NameType=Oth|l-dislocated",
731
+ "708": "PROPN|NameType=Oth|l-nsubj",
732
+ "709": "PROPN|NameType=Oth|r-acl",
733
+ "710": "PROPN|NameType=Oth|r-appos",
734
+ "711": "PROPN|NameType=Oth|r-compound",
735
+ "712": "PROPN|NameType=Oth|r-conj",
736
+ "713": "PROPN|NameType=Oth|r-flat",
737
+ "714": "PROPN|NameType=Oth|r-nmod",
738
+ "715": "PROPN|NameType=Oth|r-obj",
739
+ "716": "PROPN|NameType=Oth|r-obl",
740
+ "717": "PROPN|NameType=Oth|root",
741
+ "718": "PROPN|NameType=Pro|_",
742
+ "719": "PROPN|NameType=Pro|l-nsubj",
743
+ "720": "PROPN|NameType=Pro|l-obl",
744
+ "721": "PROPN|NameType=Pro|r-advcl",
745
+ "722": "PROPN|NameType=Pro|r-flat",
746
+ "723": "PROPN|NameType=Pro|r-nmod",
747
+ "724": "PROPN|NameType=Pro|r-obj",
748
+ "725": "PROPN|NameType=Prs|_",
749
+ "726": "PROPN|NameType=Prs|l-dislocated",
750
+ "727": "PROPN|NameType=Prs|l-nsubj",
751
+ "728": "PROPN|NameType=Prs|l-obl",
752
+ "729": "PROPN|NameType=Prs|l-vocative",
753
+ "730": "PROPN|NameType=Prs|r-conj",
754
+ "731": "PROPN|NameType=Prs|r-discourse",
755
+ "732": "PROPN|NameType=Prs|r-flat",
756
+ "733": "PROPN|NameType=Prs|r-list",
757
+ "734": "PROPN|NameType=Prs|r-nmod",
758
+ "735": "PROPN|NameType=Prs|r-obj",
759
+ "736": "PROPN|NameType=Prs|r-obl",
760
+ "737": "PROPN|NameType=Prs|r-vocative",
761
+ "738": "PROPN|NameType=Sur|_",
762
+ "739": "PROPN|NameType=Sur|l-nsubj",
763
+ "740": "PROPN|NameType=Sur|r-flat",
764
+ "741": "PROPN|NameType=Sur|r-nmod",
765
+ "742": "PROPN|NounType=Class|_",
766
+ "743": "PROPN|NounType=Class|r-clf",
767
+ "744": "PROPN|Prefix=Yes|_",
768
+ "745": "PROPN|Prefix=Yes|l-nsubj",
769
+ "746": "PROPN|Prefix=Yes|r-nmod",
770
+ "747": "PROPN|_",
771
+ "748": "PROPN|l-advmod",
772
+ "749": "PROPN|l-nsubj",
773
+ "750": "PROPN|l-obl",
774
+ "751": "PROPN|r-acl",
775
+ "752": "PROPN|r-advmod",
776
+ "753": "PROPN|r-appos",
777
+ "754": "PROPN|r-clf",
778
+ "755": "PROPN|r-compound",
779
+ "756": "PROPN|r-conj",
780
+ "757": "PROPN|r-fixed",
781
+ "758": "PROPN|r-flat",
782
+ "759": "PROPN|r-iobj",
783
+ "760": "PROPN|r-list",
784
+ "761": "PROPN|r-nmod",
785
+ "762": "PROPN|r-obj",
786
+ "763": "PROPN|r-obl",
787
+ "764": "PROPN|root",
788
+ "765": "PUNCT",
789
+ "766": "PUNCT.",
790
+ "767": "PUNCT|NounType=Class|_",
791
+ "768": "PUNCT|NounType=Class|r-punct",
792
+ "769": "PUNCT|_",
793
+ "770": "PUNCT|l-advmod",
794
+ "771": "PUNCT|l-dep",
795
+ "772": "PUNCT|l-punct",
796
+ "773": "PUNCT|r-dep",
797
+ "774": "PUNCT|r-punct",
798
+ "775": "SCONJ",
799
+ "776": "SCONJ.",
800
+ "777": "SCONJ|NumType=Mult|_",
801
+ "778": "SCONJ|NumType=Mult|l-mark",
802
+ "779": "SCONJ|Prefix=Yes|_",
803
+ "780": "SCONJ|Prefix=Yes|l-cc",
804
+ "781": "SCONJ|Prefix=Yes|l-mark",
805
+ "782": "SCONJ|VerbType=Cop|_",
806
+ "783": "SCONJ|VerbType=Cop|l-mark",
807
+ "784": "SCONJ|_",
808
+ "785": "SCONJ|l-advmod",
809
+ "786": "SCONJ|l-case",
810
+ "787": "SCONJ|l-cc",
811
+ "788": "SCONJ|l-discourse",
812
+ "789": "SCONJ|l-mark",
813
+ "790": "SCONJ|l-nsubj",
814
+ "791": "SCONJ|l-orphan",
815
+ "792": "SCONJ|r-advcl",
816
+ "793": "SCONJ|r-compound",
817
+ "794": "SCONJ|r-fixed",
818
+ "795": "SCONJ|r-flat",
819
+ "796": "SCONJ|r-mark",
820
+ "797": "SCONJ|r-orphan",
821
+ "798": "SCONJ|root",
822
+ "799": "SYM",
823
+ "800": "SYM.",
824
+ "801": "SYM|_",
825
+ "802": "SYM|l-dep",
826
+ "803": "SYM|r-clf",
827
+ "804": "SYM|r-nmod",
828
+ "805": "SYM|r-obj",
829
+ "806": "SYM|r-obl",
830
+ "807": "SYM|r-xcomp",
831
+ "808": "VERB",
832
+ "809": "VERB.",
833
+ "810": "VERB|Abbr=Yes|_",
834
+ "811": "VERB|Abbr=Yes|r-acl",
835
+ "812": "VERB|Foreign=Yes|_",
836
+ "813": "VERB|Foreign=Yes|l-nsubj",
837
+ "814": "VERB|Foreign=Yes|r-acl",
838
+ "815": "VERB|Foreign=Yes|r-advcl",
839
+ "816": "VERB|Foreign=Yes|r-ccomp",
840
+ "817": "VERB|Foreign=Yes|r-compound",
841
+ "818": "VERB|Foreign=Yes|r-conj",
842
+ "819": "VERB|Foreign=Yes|r-flat",
843
+ "820": "VERB|Foreign=Yes|r-nmod",
844
+ "821": "VERB|Foreign=Yes|r-xcomp",
845
+ "822": "VERB|Foreign=Yes|root",
846
+ "823": "VERB|NounType=Class|_",
847
+ "824": "VERB|NounType=Class|r-acl",
848
+ "825": "VERB|NounType=Class|r-compound",
849
+ "826": "VERB|PartType=Adj|_",
850
+ "827": "VERB|PartType=Adj|r-acl",
851
+ "828": "VERB|Prefix=Yes|_",
852
+ "829": "VERB|Prefix=Yes|l-acl",
853
+ "830": "VERB|Prefix=Yes|l-nsubj",
854
+ "831": "VERB|Prefix=Yes|r-acl",
855
+ "832": "VERB|Prefix=Yes|r-advcl",
856
+ "833": "VERB|Prefix=Yes|r-ccomp",
857
+ "834": "VERB|Prefix=Yes|r-compound",
858
+ "835": "VERB|Prefix=Yes|r-conj",
859
+ "836": "VERB|Prefix=Yes|r-parataxis",
860
+ "837": "VERB|Prefix=Yes|root",
861
+ "838": "VERB|VerbType=Cop|_",
862
+ "839": "VERB|VerbType=Cop|l-advmod",
863
+ "840": "VERB|VerbType=Cop|l-cop",
864
+ "841": "VERB|VerbType=Cop|r-acl",
865
+ "842": "VERB|VerbType=Cop|r-advcl",
866
+ "843": "VERB|VerbType=Cop|r-ccomp",
867
+ "844": "VERB|VerbType=Cop|r-compound",
868
+ "845": "VERB|VerbType=Cop|r-parataxis",
869
+ "846": "VERB|VerbType=Cop|root",
870
+ "847": "VERB|_",
871
+ "848": "VERB|l-acl",
872
+ "849": "VERB|l-advcl",
873
+ "850": "VERB|l-advmod",
874
+ "851": "VERB|l-case",
875
+ "852": "VERB|l-cc",
876
+ "853": "VERB|l-ccomp",
877
+ "854": "VERB|l-compound",
878
+ "855": "VERB|l-conj",
879
+ "856": "VERB|l-cop",
880
+ "857": "VERB|l-csubj",
881
+ "858": "VERB|l-discourse",
882
+ "859": "VERB|l-dislocated",
883
+ "860": "VERB|l-mark",
884
+ "861": "VERB|l-nsubj",
885
+ "862": "VERB|l-obl",
886
+ "863": "VERB|l-orphan",
887
+ "864": "VERB|l-xcomp",
888
+ "865": "VERB|r-acl",
889
+ "866": "VERB|r-advcl",
890
+ "867": "VERB|r-advmod",
891
+ "868": "VERB|r-appos",
892
+ "869": "VERB|r-case",
893
+ "870": "VERB|r-cc",
894
+ "871": "VERB|r-ccomp",
895
+ "872": "VERB|r-clf",
896
+ "873": "VERB|r-compound",
897
+ "874": "VERB|r-conj",
898
+ "875": "VERB|r-dep",
899
+ "876": "VERB|r-det",
900
+ "877": "VERB|r-discourse",
901
+ "878": "VERB|r-fixed",
902
+ "879": "VERB|r-flat",
903
+ "880": "VERB|r-list",
904
+ "881": "VERB|r-mark",
905
+ "882": "VERB|r-nmod",
906
+ "883": "VERB|r-nsubj",
907
+ "884": "VERB|r-obj",
908
+ "885": "VERB|r-obl",
909
+ "886": "VERB|r-orphan",
910
+ "887": "VERB|r-parataxis",
911
+ "888": "VERB|r-punct",
912
+ "889": "VERB|r-xcomp",
913
+ "890": "VERB|root"
914
+ },
915
+ "initializer_range": 0.02,
916
+ "intermediate_size": 8192,
917
+ "label2id": {
918
+ "ADP": 0,
919
+ "ADP.": 1,
920
+ "ADP|Foreign=Yes|_": 2,
921
+ "ADP|Foreign=Yes|l-case": 3,
922
+ "ADP|NounType=Class|_": 4,
923
+ "ADP|NounType=Class|l-case": 5,
924
+ "ADP|Prefix=Yes|_": 6,
925
+ "ADP|Prefix=Yes|l-case": 7,
926
+ "ADP|Prefix=Yes|l-mark": 8,
927
+ "ADP|_": 9,
928
+ "ADP|l-acl": 10,
929
+ "ADP|l-advcl": 11,
930
+ "ADP|l-advmod": 12,
931
+ "ADP|l-case": 13,
932
+ "ADP|l-cc": 14,
933
+ "ADP|l-dep": 15,
934
+ "ADP|l-fixed": 16,
935
+ "ADP|l-flat": 17,
936
+ "ADP|l-mark": 18,
937
+ "ADP|l-nmod": 19,
938
+ "ADP|l-nsubj": 20,
939
+ "ADP|l-obl": 21,
940
+ "ADP|l-orphan": 22,
941
+ "ADP|r-acl": 23,
942
+ "ADP|r-advmod": 24,
943
+ "ADP|r-case": 25,
944
+ "ADP|r-compound": 26,
945
+ "ADP|r-conj": 27,
946
+ "ADP|r-fixed": 28,
947
+ "ADP|r-flat": 29,
948
+ "ADP|r-obl": 30,
949
+ "ADP|r-orphan": 31,
950
+ "ADP|root": 32,
951
+ "ADV": 33,
952
+ "ADV.": 34,
953
+ "ADV|Foreign=Yes|_": 35,
954
+ "ADV|Foreign=Yes|l-advmod": 36,
955
+ "ADV|Foreign=Yes|r-advmod": 37,
956
+ "ADV|NumType=Mult|_": 38,
957
+ "ADV|NumType=Mult|r-advmod": 39,
958
+ "ADV|PartType=Adv|_": 40,
959
+ "ADV|PartType=Adv|l-advmod": 41,
960
+ "ADV|PartType=Adv|l-mark": 42,
961
+ "ADV|PartType=Adv|r-advmod": 43,
962
+ "ADV|PartType=Enp|_": 44,
963
+ "ADV|PartType=Enp|l-advmod": 45,
964
+ "ADV|PartType=Enp|r-advmod": 46,
965
+ "ADV|PartType=Int|_": 47,
966
+ "ADV|PartType=Int|r-advmod": 48,
967
+ "ADV|PartType=Int|r-fixed": 49,
968
+ "ADV|Prefix=Yes|_": 50,
969
+ "ADV|Prefix=Yes|l-advmod": 51,
970
+ "ADV|Prefix=Yes|l-mark": 52,
971
+ "ADV|Prefix=Yes|r-advmod": 53,
972
+ "ADV|_": 54,
973
+ "ADV|l-acl": 55,
974
+ "ADV|l-advcl": 56,
975
+ "ADV|l-advmod": 57,
976
+ "ADV|l-aux": 58,
977
+ "ADV|l-case": 59,
978
+ "ADV|l-compound": 60,
979
+ "ADV|l-dep": 61,
980
+ "ADV|l-det": 62,
981
+ "ADV|l-discourse": 63,
982
+ "ADV|l-fixed": 64,
983
+ "ADV|l-mark": 65,
984
+ "ADV|l-orphan": 66,
985
+ "ADV|r-acl": 67,
986
+ "ADV|r-advcl": 68,
987
+ "ADV|r-advmod": 69,
988
+ "ADV|r-aux": 70,
989
+ "ADV|r-compound": 71,
990
+ "ADV|r-conj": 72,
991
+ "ADV|r-det": 73,
992
+ "ADV|r-fixed": 74,
993
+ "ADV|r-flat": 75,
994
+ "ADV|r-mark": 76,
995
+ "ADV|r-nmod": 77,
996
+ "ADV|r-obj": 78,
997
+ "ADV|r-orphan": 79,
998
+ "ADV|r-xcomp": 80,
999
+ "ADV|root": 81,
1000
+ "AUX": 82,
1001
+ "AUX.": 83,
1002
+ "AUX|Foreign=Yes|_": 84,
1003
+ "AUX|Foreign=Yes|l-aux": 85,
1004
+ "AUX|NounType=Class|_": 86,
1005
+ "AUX|NounType=Class|r-appos": 87,
1006
+ "AUX|Prefix=Yes|_": 88,
1007
+ "AUX|Prefix=Yes|l-aux": 89,
1008
+ "AUX|Prefix=Yes|r-aux": 90,
1009
+ "AUX|VerbType=Cop|_": 91,
1010
+ "AUX|VerbType=Cop|l-acl": 92,
1011
+ "AUX|VerbType=Cop|l-advcl": 93,
1012
+ "AUX|VerbType=Cop|l-aux": 94,
1013
+ "AUX|VerbType=Cop|l-cop": 95,
1014
+ "AUX|VerbType=Cop|r-acl": 96,
1015
+ "AUX|VerbType=Cop|r-advcl": 97,
1016
+ "AUX|VerbType=Cop|r-aux": 98,
1017
+ "AUX|VerbType=Cop|r-conj": 99,
1018
+ "AUX|VerbType=Cop|r-mark": 100,
1019
+ "AUX|VerbType=Cop|root": 101,
1020
+ "AUX|_": 102,
1021
+ "AUX|l-advmod": 103,
1022
+ "AUX|l-aux": 104,
1023
+ "AUX|l-cop": 105,
1024
+ "AUX|l-mark": 106,
1025
+ "AUX|r-acl": 107,
1026
+ "AUX|r-advmod": 108,
1027
+ "AUX|r-aux": 109,
1028
+ "AUX|r-ccomp": 110,
1029
+ "AUX|r-clf": 111,
1030
+ "AUX|r-compound": 112,
1031
+ "AUX|r-conj": 113,
1032
+ "AUX|r-fixed": 114,
1033
+ "AUX|root": 115,
1034
+ "B-ADP": 116,
1035
+ "B-ADP.": 117,
1036
+ "B-ADV": 118,
1037
+ "B-ADV.": 119,
1038
+ "B-AUX": 120,
1039
+ "B-AUX.": 121,
1040
+ "B-CCONJ": 122,
1041
+ "B-CCONJ.": 123,
1042
+ "B-DET": 124,
1043
+ "B-DET.": 125,
1044
+ "B-NOUN": 126,
1045
+ "B-NOUN.": 127,
1046
+ "B-NUM": 128,
1047
+ "B-NUM.": 129,
1048
+ "B-PART": 130,
1049
+ "B-PART.": 131,
1050
+ "B-PRON": 132,
1051
+ "B-PRON.": 133,
1052
+ "B-PROPN": 134,
1053
+ "B-PROPN.": 135,
1054
+ "B-PUNCT": 136,
1055
+ "B-PUNCT.": 137,
1056
+ "B-SCONJ": 138,
1057
+ "B-SCONJ.": 139,
1058
+ "B-SYM": 140,
1059
+ "B-SYM.": 141,
1060
+ "B-VERB": 142,
1061
+ "B-VERB.": 143,
1062
+ "CCONJ": 144,
1063
+ "CCONJ.": 145,
1064
+ "CCONJ|Foreign=Yes|_": 146,
1065
+ "CCONJ|Foreign=Yes|l-cc": 147,
1066
+ "CCONJ|PronType=Prs|_": 148,
1067
+ "CCONJ|PronType=Prs|l-cc": 149,
1068
+ "CCONJ|_": 150,
1069
+ "CCONJ|l-advmod": 151,
1070
+ "CCONJ|l-case": 152,
1071
+ "CCONJ|l-cc": 153,
1072
+ "CCONJ|l-conj": 154,
1073
+ "CCONJ|l-discourse": 155,
1074
+ "CCONJ|l-fixed": 156,
1075
+ "CCONJ|l-flat": 157,
1076
+ "CCONJ|l-mark": 158,
1077
+ "CCONJ|l-nsubj": 159,
1078
+ "CCONJ|l-obj": 160,
1079
+ "CCONJ|l-orphan": 161,
1080
+ "CCONJ|r-cc": 162,
1081
+ "CCONJ|r-compound": 163,
1082
+ "CCONJ|r-fixed": 164,
1083
+ "CCONJ|r-mark": 165,
1084
+ "DET": 166,
1085
+ "DET.": 167,
1086
+ "DET|NumType=Mult|_": 168,
1087
+ "DET|NumType=Mult|l-det": 169,
1088
+ "DET|PartType=Emp|_": 170,
1089
+ "DET|PartType=Emp|r-det": 171,
1090
+ "DET|PartType=Int|_": 172,
1091
+ "DET|PartType=Int|r-det": 173,
1092
+ "DET|_": 174,
1093
+ "DET|l-compound": 175,
1094
+ "DET|l-det": 176,
1095
+ "DET|l-discourse": 177,
1096
+ "DET|l-nsubj": 178,
1097
+ "DET|l-obl": 179,
1098
+ "DET|l-orphan": 180,
1099
+ "DET|r-advmod": 181,
1100
+ "DET|r-compound": 182,
1101
+ "DET|r-conj": 183,
1102
+ "DET|r-dep": 184,
1103
+ "DET|r-det": 185,
1104
+ "DET|r-fixed": 186,
1105
+ "DET|r-flat": 187,
1106
+ "DET|r-list": 188,
1107
+ "DET|r-nmod": 189,
1108
+ "DET|r-nummod": 190,
1109
+ "DET|r-obl": 191,
1110
+ "DET|r-orphan": 192,
1111
+ "I-ADP": 193,
1112
+ "I-ADP.": 194,
1113
+ "I-ADV": 195,
1114
+ "I-ADV.": 196,
1115
+ "I-AUX": 197,
1116
+ "I-AUX.": 198,
1117
+ "I-CCONJ": 199,
1118
+ "I-CCONJ.": 200,
1119
+ "I-DET": 201,
1120
+ "I-DET.": 202,
1121
+ "I-NOUN": 203,
1122
+ "I-NOUN.": 204,
1123
+ "I-NUM": 205,
1124
+ "I-NUM.": 206,
1125
+ "I-PART": 207,
1126
+ "I-PART.": 208,
1127
+ "I-PRON": 209,
1128
+ "I-PRON.": 210,
1129
+ "I-PROPN": 211,
1130
+ "I-PROPN.": 212,
1131
+ "I-PUNCT": 213,
1132
+ "I-PUNCT.": 214,
1133
+ "I-SCONJ": 215,
1134
+ "I-SCONJ.": 216,
1135
+ "I-SYM": 217,
1136
+ "I-SYM.": 218,
1137
+ "I-VERB": 219,
1138
+ "I-VERB.": 220,
1139
+ "NOUN": 221,
1140
+ "NOUN.": 222,
1141
+ "NOUN|Abbr=Yes|Foreign=Yes|_": 223,
1142
+ "NOUN|Abbr=Yes|Foreign=Yes|r-nmod": 224,
1143
+ "NOUN|Abbr=Yes|Prefix=Yes|_": 225,
1144
+ "NOUN|Abbr=Yes|Prefix=Yes|l-flat": 226,
1145
+ "NOUN|Abbr=Yes|_": 227,
1146
+ "NOUN|Abbr=Yes|l-flat": 228,
1147
+ "NOUN|Abbr=Yes|l-nmod": 229,
1148
+ "NOUN|Abbr=Yes|l-nsubj": 230,
1149
+ "NOUN|Abbr=Yes|l-obl": 231,
1150
+ "NOUN|Abbr=Yes|r-acl": 232,
1151
+ "NOUN|Abbr=Yes|r-appos": 233,
1152
+ "NOUN|Abbr=Yes|r-clf": 234,
1153
+ "NOUN|Abbr=Yes|r-conj": 235,
1154
+ "NOUN|Abbr=Yes|r-fixed": 236,
1155
+ "NOUN|Abbr=Yes|r-flat": 237,
1156
+ "NOUN|Abbr=Yes|r-nmod": 238,
1157
+ "NOUN|Abbr=Yes|r-obj": 239,
1158
+ "NOUN|Abbr=Yes|r-obl": 240,
1159
+ "NOUN|Foreign=Yes|NounType=Class|_": 241,
1160
+ "NOUN|Foreign=Yes|NounType=Class|r-clf": 242,
1161
+ "NOUN|Foreign=Yes|NounType=Class|r-obj": 243,
1162
+ "NOUN|Foreign=Yes|Prefix=Yes|_": 244,
1163
+ "NOUN|Foreign=Yes|Prefix=Yes|l-flat": 245,
1164
+ "NOUN|Foreign=Yes|Prefix=Yes|r-appos": 246,
1165
+ "NOUN|Foreign=Yes|_": 247,
1166
+ "NOUN|Foreign=Yes|l-dislocated": 248,
1167
+ "NOUN|Foreign=Yes|l-flat": 249,
1168
+ "NOUN|Foreign=Yes|l-nmod": 250,
1169
+ "NOUN|Foreign=Yes|l-nsubj": 251,
1170
+ "NOUN|Foreign=Yes|l-obl": 252,
1171
+ "NOUN|Foreign=Yes|r-acl": 253,
1172
+ "NOUN|Foreign=Yes|r-advcl": 254,
1173
+ "NOUN|Foreign=Yes|r-advmod": 255,
1174
+ "NOUN|Foreign=Yes|r-appos": 256,
1175
+ "NOUN|Foreign=Yes|r-ccomp": 257,
1176
+ "NOUN|Foreign=Yes|r-clf": 258,
1177
+ "NOUN|Foreign=Yes|r-compound": 259,
1178
+ "NOUN|Foreign=Yes|r-conj": 260,
1179
+ "NOUN|Foreign=Yes|r-flat": 261,
1180
+ "NOUN|Foreign=Yes|r-iobj": 262,
1181
+ "NOUN|Foreign=Yes|r-list": 263,
1182
+ "NOUN|Foreign=Yes|r-nmod": 264,
1183
+ "NOUN|Foreign=Yes|r-obj": 265,
1184
+ "NOUN|Foreign=Yes|r-obl": 266,
1185
+ "NOUN|Foreign=Yes|r-xcomp": 267,
1186
+ "NOUN|Foreign=Yes|root": 268,
1187
+ "NOUN|NameType=Com|_": 269,
1188
+ "NOUN|NameType=Com|r-nmod": 270,
1189
+ "NOUN|NameType=Geo|_": 271,
1190
+ "NOUN|NameType=Geo|l-nsubj": 272,
1191
+ "NOUN|NameType=Geo|r-nmod": 273,
1192
+ "NOUN|NameType=Geo|r-obj": 274,
1193
+ "NOUN|NameType=Nat|_": 275,
1194
+ "NOUN|NameType=Nat|r-nmod": 276,
1195
+ "NOUN|NameType=Oth|_": 277,
1196
+ "NOUN|NameType=Oth|l-nsubj": 278,
1197
+ "NOUN|NameType=Oth|r-conj": 279,
1198
+ "NOUN|NameType=Oth|r-flat": 280,
1199
+ "NOUN|NameType=Oth|r-nmod": 281,
1200
+ "NOUN|NameType=Pro|_": 282,
1201
+ "NOUN|NameType=Pro|r-nmod": 283,
1202
+ "NOUN|NameType=Prs|_": 284,
1203
+ "NOUN|NameType=Prs|l-nsubj": 285,
1204
+ "NOUN|NameType=Prs|r-nmod": 286,
1205
+ "NOUN|NounType=Class|Prefix=Yes|_": 287,
1206
+ "NOUN|NounType=Class|Prefix=Yes|l-advcl": 288,
1207
+ "NOUN|NounType=Class|Prefix=Yes|l-advmod": 289,
1208
+ "NOUN|NounType=Class|Prefix=Yes|l-mark": 290,
1209
+ "NOUN|NounType=Class|Prefix=Yes|l-nmod": 291,
1210
+ "NOUN|NounType=Class|Prefix=Yes|l-nsubj": 292,
1211
+ "NOUN|NounType=Class|Prefix=Yes|r-advcl": 293,
1212
+ "NOUN|NounType=Class|Prefix=Yes|r-clf": 294,
1213
+ "NOUN|NounType=Class|Prefix=Yes|r-nmod": 295,
1214
+ "NOUN|NounType=Class|Prefix=Yes|r-obj": 296,
1215
+ "NOUN|NounType=Class|_": 297,
1216
+ "NOUN|NounType=Class|l-advcl": 298,
1217
+ "NOUN|NounType=Class|l-advmod": 299,
1218
+ "NOUN|NounType=Class|l-clf": 300,
1219
+ "NOUN|NounType=Class|l-dislocated": 301,
1220
+ "NOUN|NounType=Class|l-nmod": 302,
1221
+ "NOUN|NounType=Class|l-nsubj": 303,
1222
+ "NOUN|NounType=Class|l-obj": 304,
1223
+ "NOUN|NounType=Class|l-obl": 305,
1224
+ "NOUN|NounType=Class|r-acl": 306,
1225
+ "NOUN|NounType=Class|r-advcl": 307,
1226
+ "NOUN|NounType=Class|r-advmod": 308,
1227
+ "NOUN|NounType=Class|r-appos": 309,
1228
+ "NOUN|NounType=Class|r-cc": 310,
1229
+ "NOUN|NounType=Class|r-ccomp": 311,
1230
+ "NOUN|NounType=Class|r-clf": 312,
1231
+ "NOUN|NounType=Class|r-compound": 313,
1232
+ "NOUN|NounType=Class|r-conj": 314,
1233
+ "NOUN|NounType=Class|r-dislocated": 315,
1234
+ "NOUN|NounType=Class|r-fixed": 316,
1235
+ "NOUN|NounType=Class|r-flat": 317,
1236
+ "NOUN|NounType=Class|r-iobj": 318,
1237
+ "NOUN|NounType=Class|r-list": 319,
1238
+ "NOUN|NounType=Class|r-nmod": 320,
1239
+ "NOUN|NounType=Class|r-nummod": 321,
1240
+ "NOUN|NounType=Class|r-obj": 322,
1241
+ "NOUN|NounType=Class|r-obl": 323,
1242
+ "NOUN|NounType=Class|r-orphan": 324,
1243
+ "NOUN|NounType=Class|r-xcomp": 325,
1244
+ "NOUN|NounType=Class|root": 326,
1245
+ "NOUN|NumType=Mult|_": 327,
1246
+ "NOUN|NumType=Mult|r-advcl": 328,
1247
+ "NOUN|NumType=Mult|r-nmod": 329,
1248
+ "NOUN|NumType=Mult|r-obj": 330,
1249
+ "NOUN|PartType=Enp|_": 331,
1250
+ "NOUN|PartType=Enp|r-obj": 332,
1251
+ "NOUN|PartType=Enp|r-obl": 333,
1252
+ "NOUN|PartType=Int|_": 334,
1253
+ "NOUN|PartType=Int|r-obj": 335,
1254
+ "NOUN|PartType=Res|_": 336,
1255
+ "NOUN|PartType=Res|r-nmod": 337,
1256
+ "NOUN|PartType=Res|r-obj": 338,
1257
+ "NOUN|Prefix=Yes|_": 339,
1258
+ "NOUN|Prefix=Yes|l-acl": 340,
1259
+ "NOUN|Prefix=Yes|l-advcl": 341,
1260
+ "NOUN|Prefix=Yes|l-clf": 342,
1261
+ "NOUN|Prefix=Yes|l-csubj": 343,
1262
+ "NOUN|Prefix=Yes|l-dislocated": 344,
1263
+ "NOUN|Prefix=Yes|l-flat": 345,
1264
+ "NOUN|Prefix=Yes|l-nmod": 346,
1265
+ "NOUN|Prefix=Yes|l-nsubj": 347,
1266
+ "NOUN|Prefix=Yes|l-obj": 348,
1267
+ "NOUN|Prefix=Yes|l-obl": 349,
1268
+ "NOUN|Prefix=Yes|r-acl": 350,
1269
+ "NOUN|Prefix=Yes|r-advcl": 351,
1270
+ "NOUN|Prefix=Yes|r-advmod": 352,
1271
+ "NOUN|Prefix=Yes|r-appos": 353,
1272
+ "NOUN|Prefix=Yes|r-case": 354,
1273
+ "NOUN|Prefix=Yes|r-cc": 355,
1274
+ "NOUN|Prefix=Yes|r-ccomp": 356,
1275
+ "NOUN|Prefix=Yes|r-clf": 357,
1276
+ "NOUN|Prefix=Yes|r-compound": 358,
1277
+ "NOUN|Prefix=Yes|r-conj": 359,
1278
+ "NOUN|Prefix=Yes|r-dislocated": 360,
1279
+ "NOUN|Prefix=Yes|r-fixed": 361,
1280
+ "NOUN|Prefix=Yes|r-flat": 362,
1281
+ "NOUN|Prefix=Yes|r-iobj": 363,
1282
+ "NOUN|Prefix=Yes|r-list": 364,
1283
+ "NOUN|Prefix=Yes|r-nmod": 365,
1284
+ "NOUN|Prefix=Yes|r-nummod": 366,
1285
+ "NOUN|Prefix=Yes|r-obj": 367,
1286
+ "NOUN|Prefix=Yes|r-obl": 368,
1287
+ "NOUN|Prefix=Yes|r-orphan": 369,
1288
+ "NOUN|Prefix=Yes|r-xcomp": 370,
1289
+ "NOUN|Prefix=Yes|root": 371,
1290
+ "NOUN|_": 372,
1291
+ "NOUN|l-acl": 373,
1292
+ "NOUN|l-advcl": 374,
1293
+ "NOUN|l-advmod": 375,
1294
+ "NOUN|l-case": 376,
1295
+ "NOUN|l-ccomp": 377,
1296
+ "NOUN|l-compound": 378,
1297
+ "NOUN|l-csubj": 379,
1298
+ "NOUN|l-discourse": 380,
1299
+ "NOUN|l-dislocated": 381,
1300
+ "NOUN|l-expl": 382,
1301
+ "NOUN|l-flat": 383,
1302
+ "NOUN|l-iobj": 384,
1303
+ "NOUN|l-mark": 385,
1304
+ "NOUN|l-nmod": 386,
1305
+ "NOUN|l-nsubj": 387,
1306
+ "NOUN|l-nummod": 388,
1307
+ "NOUN|l-obj": 389,
1308
+ "NOUN|l-obl": 390,
1309
+ "NOUN|l-orphan": 391,
1310
+ "NOUN|l-vocative": 392,
1311
+ "NOUN|r-acl": 393,
1312
+ "NOUN|r-advcl": 394,
1313
+ "NOUN|r-advmod": 395,
1314
+ "NOUN|r-appos": 396,
1315
+ "NOUN|r-case": 397,
1316
+ "NOUN|r-cc": 398,
1317
+ "NOUN|r-ccomp": 399,
1318
+ "NOUN|r-clf": 400,
1319
+ "NOUN|r-compound": 401,
1320
+ "NOUN|r-conj": 402,
1321
+ "NOUN|r-cop": 403,
1322
+ "NOUN|r-discourse": 404,
1323
+ "NOUN|r-dislocated": 405,
1324
+ "NOUN|r-fixed": 406,
1325
+ "NOUN|r-flat": 407,
1326
+ "NOUN|r-iobj": 408,
1327
+ "NOUN|r-list": 409,
1328
+ "NOUN|r-mark": 410,
1329
+ "NOUN|r-nmod": 411,
1330
+ "NOUN|r-nsubj": 412,
1331
+ "NOUN|r-nummod": 413,
1332
+ "NOUN|r-obj": 414,
1333
+ "NOUN|r-obl": 415,
1334
+ "NOUN|r-orphan": 416,
1335
+ "NOUN|r-parataxis": 417,
1336
+ "NOUN|r-xcomp": 418,
1337
+ "NOUN|root": 419,
1338
+ "NUM": 420,
1339
+ "NUM.": 421,
1340
+ "NUM|Abbr=Yes|_": 422,
1341
+ "NUM|Abbr=Yes|r-flat": 423,
1342
+ "NUM|Abbr=Yes|r-nummod": 424,
1343
+ "NUM|Abbr=Yes|r-obj": 425,
1344
+ "NUM|Foreign=Yes|_": 426,
1345
+ "NUM|Foreign=Yes|r-clf": 427,
1346
+ "NUM|NumType=Mult|_": 428,
1347
+ "NUM|NumType=Mult|l-advmod": 429,
1348
+ "NUM|NumType=Mult|l-nummod": 430,
1349
+ "NUM|NumType=Mult|r-advmod": 431,
1350
+ "NUM|Prefix=Yes|_": 432,
1351
+ "NUM|Prefix=Yes|l-nummod": 433,
1352
+ "NUM|_": 434,
1353
+ "NUM|l-advcl": 435,
1354
+ "NUM|l-advmod": 436,
1355
+ "NUM|l-case": 437,
1356
+ "NUM|l-clf": 438,
1357
+ "NUM|l-dep": 439,
1358
+ "NUM|l-flat": 440,
1359
+ "NUM|l-nmod": 441,
1360
+ "NUM|l-nsubj": 442,
1361
+ "NUM|l-nummod": 443,
1362
+ "NUM|l-obl": 444,
1363
+ "NUM|r-acl": 445,
1364
+ "NUM|r-advmod": 446,
1365
+ "NUM|r-appos": 447,
1366
+ "NUM|r-ccomp": 448,
1367
+ "NUM|r-compound": 449,
1368
+ "NUM|r-conj": 450,
1369
+ "NUM|r-det": 451,
1370
+ "NUM|r-fixed": 452,
1371
+ "NUM|r-flat": 453,
1372
+ "NUM|r-iobj": 454,
1373
+ "NUM|r-nmod": 455,
1374
+ "NUM|r-nummod": 456,
1375
+ "NUM|r-obj": 457,
1376
+ "NUM|r-obl": 458,
1377
+ "NUM|root": 459,
1378
+ "PART": 460,
1379
+ "PART.": 461,
1380
+ "PART|NameType=Oth|_": 462,
1381
+ "PART|NameType=Oth|l-advmod": 463,
1382
+ "PART|NounType=Class|PartType=Emp|Prefix=Yes|_": 464,
1383
+ "PART|NounType=Class|PartType=Emp|Prefix=Yes|l-mark": 465,
1384
+ "PART|NounType=Class|PartType=Emp|_": 466,
1385
+ "PART|NounType=Class|PartType=Emp|l-mark": 467,
1386
+ "PART|NounType=Class|Prefix=Yes|_": 468,
1387
+ "PART|NounType=Class|Prefix=Yes|l-mark": 469,
1388
+ "PART|NumType=Mult|PartType=Emp|_": 470,
1389
+ "PART|NumType=Mult|PartType=Emp|l-mark": 471,
1390
+ "PART|PartType=Adj|_": 472,
1391
+ "PART|PartType=Adj|l-mark": 473,
1392
+ "PART|PartType=Adj|l-orphan": 474,
1393
+ "PART|PartType=Adj|r-acl": 475,
1394
+ "PART|PartType=Adj|r-compound": 476,
1395
+ "PART|PartType=Adj|r-nmod": 477,
1396
+ "PART|PartType=Adv|_": 478,
1397
+ "PART|PartType=Adv|l-advmod": 479,
1398
+ "PART|PartType=Adv|l-mark": 480,
1399
+ "PART|PartType=Adv|r-advmod": 481,
1400
+ "PART|PartType=Emp|Prefix=Yes|_": 482,
1401
+ "PART|PartType=Emp|Prefix=Yes|l-advmod": 483,
1402
+ "PART|PartType=Emp|Prefix=Yes|l-aux": 484,
1403
+ "PART|PartType=Emp|Prefix=Yes|l-mark": 485,
1404
+ "PART|PartType=Emp|_": 486,
1405
+ "PART|PartType=Emp|l-advmod": 487,
1406
+ "PART|PartType=Emp|l-case": 488,
1407
+ "PART|PartType=Emp|l-discourse": 489,
1408
+ "PART|PartType=Emp|l-mark": 490,
1409
+ "PART|PartType=Emp|r-acl": 491,
1410
+ "PART|PartType=Emp|r-advmod": 492,
1411
+ "PART|PartType=Emp|r-aux": 493,
1412
+ "PART|PartType=Emp|r-compound": 494,
1413
+ "PART|PartType=Emp|r-det": 495,
1414
+ "PART|PartType=Emp|r-fixed": 496,
1415
+ "PART|PartType=Emp|r-mark": 497,
1416
+ "PART|PartType=Emp|r-nmod": 498,
1417
+ "PART|PartType=Enp|_": 499,
1418
+ "PART|PartType=Enp|l-discourse": 500,
1419
+ "PART|PartType=Enp|r-acl": 501,
1420
+ "PART|PartType=Enp|r-advmod": 502,
1421
+ "PART|PartType=Enp|r-compound": 503,
1422
+ "PART|PartType=Enp|r-dep": 504,
1423
+ "PART|PartType=Enp|r-det": 505,
1424
+ "PART|PartType=Enp|r-discourse": 506,
1425
+ "PART|PartType=Enp|r-fixed": 507,
1426
+ "PART|PartType=Enp|r-obl": 508,
1427
+ "PART|PartType=Int|_": 509,
1428
+ "PART|PartType=Int|l-advmod": 510,
1429
+ "PART|PartType=Int|l-mark": 511,
1430
+ "PART|PartType=Int|r-acl": 512,
1431
+ "PART|PartType=Int|r-advmod": 513,
1432
+ "PART|PartType=Int|r-dep": 514,
1433
+ "PART|PartType=Int|r-discourse": 515,
1434
+ "PART|PartType=Int|r-nmod": 516,
1435
+ "PART|PartType=Int|r-obj": 517,
1436
+ "PART|PartType=Int|r-obl": 518,
1437
+ "PART|PartType=Neg|_": 519,
1438
+ "PART|PartType=Neg|l-advcl": 520,
1439
+ "PART|PartType=Neg|l-advmod": 521,
1440
+ "PART|PartType=Neg|l-aux": 522,
1441
+ "PART|PartType=Neg|l-mark": 523,
1442
+ "PART|PartType=Neg|r-acl": 524,
1443
+ "PART|PartType=Neg|r-advmod": 525,
1444
+ "PART|PartType=Neg|r-fixed": 526,
1445
+ "PART|PartType=Res|_": 527,
1446
+ "PART|PartType=Res|r-advmod": 528,
1447
+ "PART|PartType=Res|r-discourse": 529,
1448
+ "PART|PartType=Res|r-fixed": 530,
1449
+ "PART|Prefix=Yes|_": 531,
1450
+ "PART|Prefix=Yes|l-advmod": 532,
1451
+ "PART|Prefix=Yes|l-aux": 533,
1452
+ "PART|Prefix=Yes|l-mark": 534,
1453
+ "PART|Prefix=Yes|r-acl": 535,
1454
+ "PART|Prefix=Yes|r-nmod": 536,
1455
+ "PART|_": 537,
1456
+ "PART|l-advmod": 538,
1457
+ "PART|l-discourse": 539,
1458
+ "PART|l-mark": 540,
1459
+ "PART|l-nsubj": 541,
1460
+ "PART|r-acl": 542,
1461
+ "PART|r-advmod": 543,
1462
+ "PART|r-discourse": 544,
1463
+ "PART|r-fixed": 545,
1464
+ "PART|r-mark": 546,
1465
+ "PART|r-obj": 547,
1466
+ "PRON": 548,
1467
+ "PRON.": 549,
1468
+ "PRON|NounType=Class|_": 550,
1469
+ "PRON|NounType=Class|r-clf": 551,
1470
+ "PRON|PronType=Prs|_": 552,
1471
+ "PRON|PronType=Prs|l-advmod": 553,
1472
+ "PRON|PronType=Prs|l-expl": 554,
1473
+ "PRON|PronType=Prs|l-nsubj": 555,
1474
+ "PRON|PronType=Prs|l-obj": 556,
1475
+ "PRON|PronType=Prs|l-obl": 557,
1476
+ "PRON|PronType=Prs|r-advcl": 558,
1477
+ "PRON|PronType=Prs|r-advmod": 559,
1478
+ "PRON|PronType=Prs|r-ccomp": 560,
1479
+ "PRON|PronType=Prs|r-clf": 561,
1480
+ "PRON|PronType=Prs|r-conj": 562,
1481
+ "PRON|PronType=Prs|r-nmod": 563,
1482
+ "PRON|PronType=Prs|r-nsubj": 564,
1483
+ "PRON|PronType=Prs|r-obj": 565,
1484
+ "PRON|PronType=Prs|r-obl": 566,
1485
+ "PRON|PronType=Prs|root": 567,
1486
+ "PRON|PronType=Rcp|_": 568,
1487
+ "PRON|PronType=Rcp|r-advmod": 569,
1488
+ "PRON|PronType=Rcp|r-iobj": 570,
1489
+ "PRON|PronType=Rcp|r-nmod": 571,
1490
+ "PRON|PronType=Rcp|r-obj": 572,
1491
+ "PRON|PronType=Rcp|r-obl": 573,
1492
+ "PRON|_": 574,
1493
+ "PRON|l-advcl": 575,
1494
+ "PRON|l-advmod": 576,
1495
+ "PRON|l-compound": 577,
1496
+ "PRON|l-csubj": 578,
1497
+ "PRON|l-dislocated": 579,
1498
+ "PRON|l-expl": 580,
1499
+ "PRON|l-iobj": 581,
1500
+ "PRON|l-mark": 582,
1501
+ "PRON|l-nsubj": 583,
1502
+ "PRON|l-obj": 584,
1503
+ "PRON|l-obl": 585,
1504
+ "PRON|r-acl": 586,
1505
+ "PRON|r-advmod": 587,
1506
+ "PRON|r-appos": 588,
1507
+ "PRON|r-ccomp": 589,
1508
+ "PRON|r-compound": 590,
1509
+ "PRON|r-conj": 591,
1510
+ "PRON|r-det": 592,
1511
+ "PRON|r-discourse": 593,
1512
+ "PRON|r-fixed": 594,
1513
+ "PRON|r-flat": 595,
1514
+ "PRON|r-iobj": 596,
1515
+ "PRON|r-nmod": 597,
1516
+ "PRON|r-nsubj": 598,
1517
+ "PRON|r-obj": 599,
1518
+ "PRON|r-obl": 600,
1519
+ "PROPN": 601,
1520
+ "PROPN.": 602,
1521
+ "PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth|_": 603,
1522
+ "PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth|r-obj": 604,
1523
+ "PROPN|Abbr=Yes|NameType=Com|_": 605,
1524
+ "PROPN|Abbr=Yes|NameType=Com|r-advmod": 606,
1525
+ "PROPN|Abbr=Yes|NameType=Com|r-nmod": 607,
1526
+ "PROPN|Abbr=Yes|_": 608,
1527
+ "PROPN|Abbr=Yes|l-nmod": 609,
1528
+ "PROPN|Abbr=Yes|l-nsubj": 610,
1529
+ "PROPN|Abbr=Yes|r-nmod": 611,
1530
+ "PROPN|Foreign=Yes|NameType=Com|_": 612,
1531
+ "PROPN|Foreign=Yes|NameType=Com|l-nsubj": 613,
1532
+ "PROPN|Foreign=Yes|NameType=Com|r-list": 614,
1533
+ "PROPN|Foreign=Yes|NameType=Com|r-nmod": 615,
1534
+ "PROPN|Foreign=Yes|NameType=Com|r-obl": 616,
1535
+ "PROPN|Foreign=Yes|NameType=Geo|_": 617,
1536
+ "PROPN|Foreign=Yes|NameType=Geo|r-obj": 618,
1537
+ "PROPN|Foreign=Yes|NameType=Geo|r-obl": 619,
1538
+ "PROPN|Foreign=Yes|NameType=Giv|_": 620,
1539
+ "PROPN|Foreign=Yes|NameType=Giv|l-nsubj": 621,
1540
+ "PROPN|Foreign=Yes|NameType=Oth|_": 622,
1541
+ "PROPN|Foreign=Yes|NameType=Oth|r-conj": 623,
1542
+ "PROPN|Foreign=Yes|NameType=Oth|r-flat": 624,
1543
+ "PROPN|Foreign=Yes|NameType=Oth|r-nmod": 625,
1544
+ "PROPN|Foreign=Yes|NameType=Prs|_": 626,
1545
+ "PROPN|Foreign=Yes|NameType=Prs|l-flat": 627,
1546
+ "PROPN|Foreign=Yes|NameType=Prs|l-nsubj": 628,
1547
+ "PROPN|Foreign=Yes|NameType=Prs|r-conj": 629,
1548
+ "PROPN|Foreign=Yes|NameType=Prs|r-flat": 630,
1549
+ "PROPN|Foreign=Yes|NameType=Prs|r-nmod": 631,
1550
+ "PROPN|Foreign=Yes|NameType=Prs|r-obj": 632,
1551
+ "PROPN|Foreign=Yes|NameType=Prs|r-obl": 633,
1552
+ "PROPN|Foreign=Yes|NameType=Sur|_": 634,
1553
+ "PROPN|Foreign=Yes|NameType=Sur|r-flat": 635,
1554
+ "PROPN|Foreign=Yes|_": 636,
1555
+ "PROPN|Foreign=Yes|l-flat": 637,
1556
+ "PROPN|Foreign=Yes|l-nmod": 638,
1557
+ "PROPN|Foreign=Yes|l-nsubj": 639,
1558
+ "PROPN|Foreign=Yes|l-obl": 640,
1559
+ "PROPN|Foreign=Yes|r-appos": 641,
1560
+ "PROPN|Foreign=Yes|r-ccomp": 642,
1561
+ "PROPN|Foreign=Yes|r-compound": 643,
1562
+ "PROPN|Foreign=Yes|r-conj": 644,
1563
+ "PROPN|Foreign=Yes|r-flat": 645,
1564
+ "PROPN|Foreign=Yes|r-iobj": 646,
1565
+ "PROPN|Foreign=Yes|r-list": 647,
1566
+ "PROPN|Foreign=Yes|r-nmod": 648,
1567
+ "PROPN|Foreign=Yes|r-nsubj": 649,
1568
+ "PROPN|Foreign=Yes|r-obj": 650,
1569
+ "PROPN|Foreign=Yes|r-obl": 651,
1570
+ "PROPN|Foreign=Yes|root": 652,
1571
+ "PROPN|NameType=Com|_": 653,
1572
+ "PROPN|NameType=Com|l-nsubj": 654,
1573
+ "PROPN|NameType=Com|l-obl": 655,
1574
+ "PROPN|NameType=Com|r-appos": 656,
1575
+ "PROPN|NameType=Com|r-conj": 657,
1576
+ "PROPN|NameType=Com|r-flat": 658,
1577
+ "PROPN|NameType=Com|r-list": 659,
1578
+ "PROPN|NameType=Com|r-nmod": 660,
1579
+ "PROPN|NameType=Com|r-nsubj": 661,
1580
+ "PROPN|NameType=Com|r-obj": 662,
1581
+ "PROPN|NameType=Com|r-obl": 663,
1582
+ "PROPN|NameType=Geo|_": 664,
1583
+ "PROPN|NameType=Geo|l-nsubj": 665,
1584
+ "PROPN|NameType=Geo|l-obl": 666,
1585
+ "PROPN|NameType=Geo|r-compound": 667,
1586
+ "PROPN|NameType=Geo|r-conj": 668,
1587
+ "PROPN|NameType=Geo|r-flat": 669,
1588
+ "PROPN|NameType=Geo|r-list": 670,
1589
+ "PROPN|NameType=Geo|r-nmod": 671,
1590
+ "PROPN|NameType=Geo|r-nsubj": 672,
1591
+ "PROPN|NameType=Geo|r-nummod": 673,
1592
+ "PROPN|NameType=Geo|r-obj": 674,
1593
+ "PROPN|NameType=Geo|r-obl": 675,
1594
+ "PROPN|NameType=Geo|root": 676,
1595
+ "PROPN|NameType=Giv|_": 677,
1596
+ "PROPN|NameType=Giv|l-dislocated": 678,
1597
+ "PROPN|NameType=Giv|l-nsubj": 679,
1598
+ "PROPN|NameType=Giv|l-obl": 680,
1599
+ "PROPN|NameType=Giv|r-acl": 681,
1600
+ "PROPN|NameType=Giv|r-appos": 682,
1601
+ "PROPN|NameType=Giv|r-ccomp": 683,
1602
+ "PROPN|NameType=Giv|r-conj": 684,
1603
+ "PROPN|NameType=Giv|r-flat": 685,
1604
+ "PROPN|NameType=Giv|r-list": 686,
1605
+ "PROPN|NameType=Giv|r-nmod": 687,
1606
+ "PROPN|NameType=Giv|r-nsubj": 688,
1607
+ "PROPN|NameType=Giv|r-obj": 689,
1608
+ "PROPN|NameType=Giv|r-obl": 690,
1609
+ "PROPN|NameType=Giv|root": 691,
1610
+ "PROPN|NameType=Nat|_": 692,
1611
+ "PROPN|NameType=Nat|l-csubj": 693,
1612
+ "PROPN|NameType=Nat|l-nsubj": 694,
1613
+ "PROPN|NameType=Nat|l-obl": 695,
1614
+ "PROPN|NameType=Nat|r-acl": 696,
1615
+ "PROPN|NameType=Nat|r-appos": 697,
1616
+ "PROPN|NameType=Nat|r-compound": 698,
1617
+ "PROPN|NameType=Nat|r-conj": 699,
1618
+ "PROPN|NameType=Nat|r-flat": 700,
1619
+ "PROPN|NameType=Nat|r-list": 701,
1620
+ "PROPN|NameType=Nat|r-nmod": 702,
1621
+ "PROPN|NameType=Nat|r-nummod": 703,
1622
+ "PROPN|NameType=Nat|r-obj": 704,
1623
+ "PROPN|NameType=Nat|r-obl": 705,
1624
+ "PROPN|NameType=Oth|_": 706,
1625
+ "PROPN|NameType=Oth|l-dislocated": 707,
1626
+ "PROPN|NameType=Oth|l-nsubj": 708,
1627
+ "PROPN|NameType=Oth|r-acl": 709,
1628
+ "PROPN|NameType=Oth|r-appos": 710,
1629
+ "PROPN|NameType=Oth|r-compound": 711,
1630
+ "PROPN|NameType=Oth|r-conj": 712,
1631
+ "PROPN|NameType=Oth|r-flat": 713,
1632
+ "PROPN|NameType=Oth|r-nmod": 714,
1633
+ "PROPN|NameType=Oth|r-obj": 715,
1634
+ "PROPN|NameType=Oth|r-obl": 716,
1635
+ "PROPN|NameType=Oth|root": 717,
1636
+ "PROPN|NameType=Pro|_": 718,
1637
+ "PROPN|NameType=Pro|l-nsubj": 719,
1638
+ "PROPN|NameType=Pro|l-obl": 720,
1639
+ "PROPN|NameType=Pro|r-advcl": 721,
1640
+ "PROPN|NameType=Pro|r-flat": 722,
1641
+ "PROPN|NameType=Pro|r-nmod": 723,
1642
+ "PROPN|NameType=Pro|r-obj": 724,
1643
+ "PROPN|NameType=Prs|_": 725,
1644
+ "PROPN|NameType=Prs|l-dislocated": 726,
1645
+ "PROPN|NameType=Prs|l-nsubj": 727,
1646
+ "PROPN|NameType=Prs|l-obl": 728,
1647
+ "PROPN|NameType=Prs|l-vocative": 729,
1648
+ "PROPN|NameType=Prs|r-conj": 730,
1649
+ "PROPN|NameType=Prs|r-discourse": 731,
1650
+ "PROPN|NameType=Prs|r-flat": 732,
1651
+ "PROPN|NameType=Prs|r-list": 733,
1652
+ "PROPN|NameType=Prs|r-nmod": 734,
1653
+ "PROPN|NameType=Prs|r-obj": 735,
1654
+ "PROPN|NameType=Prs|r-obl": 736,
1655
+ "PROPN|NameType=Prs|r-vocative": 737,
1656
+ "PROPN|NameType=Sur|_": 738,
1657
+ "PROPN|NameType=Sur|l-nsubj": 739,
1658
+ "PROPN|NameType=Sur|r-flat": 740,
1659
+ "PROPN|NameType=Sur|r-nmod": 741,
1660
+ "PROPN|NounType=Class|_": 742,
1661
+ "PROPN|NounType=Class|r-clf": 743,
1662
+ "PROPN|Prefix=Yes|_": 744,
1663
+ "PROPN|Prefix=Yes|l-nsubj": 745,
1664
+ "PROPN|Prefix=Yes|r-nmod": 746,
1665
+ "PROPN|_": 747,
1666
+ "PROPN|l-advmod": 748,
1667
+ "PROPN|l-nsubj": 749,
1668
+ "PROPN|l-obl": 750,
1669
+ "PROPN|r-acl": 751,
1670
+ "PROPN|r-advmod": 752,
1671
+ "PROPN|r-appos": 753,
1672
+ "PROPN|r-clf": 754,
1673
+ "PROPN|r-compound": 755,
1674
+ "PROPN|r-conj": 756,
1675
+ "PROPN|r-fixed": 757,
1676
+ "PROPN|r-flat": 758,
1677
+ "PROPN|r-iobj": 759,
1678
+ "PROPN|r-list": 760,
1679
+ "PROPN|r-nmod": 761,
1680
+ "PROPN|r-obj": 762,
1681
+ "PROPN|r-obl": 763,
1682
+ "PROPN|root": 764,
1683
+ "PUNCT": 765,
1684
+ "PUNCT.": 766,
1685
+ "PUNCT|NounType=Class|_": 767,
1686
+ "PUNCT|NounType=Class|r-punct": 768,
1687
+ "PUNCT|_": 769,
1688
+ "PUNCT|l-advmod": 770,
1689
+ "PUNCT|l-dep": 771,
1690
+ "PUNCT|l-punct": 772,
1691
+ "PUNCT|r-dep": 773,
1692
+ "PUNCT|r-punct": 774,
1693
+ "SCONJ": 775,
1694
+ "SCONJ.": 776,
1695
+ "SCONJ|NumType=Mult|_": 777,
1696
+ "SCONJ|NumType=Mult|l-mark": 778,
1697
+ "SCONJ|Prefix=Yes|_": 779,
1698
+ "SCONJ|Prefix=Yes|l-cc": 780,
1699
+ "SCONJ|Prefix=Yes|l-mark": 781,
1700
+ "SCONJ|VerbType=Cop|_": 782,
1701
+ "SCONJ|VerbType=Cop|l-mark": 783,
1702
+ "SCONJ|_": 784,
1703
+ "SCONJ|l-advmod": 785,
1704
+ "SCONJ|l-case": 786,
1705
+ "SCONJ|l-cc": 787,
1706
+ "SCONJ|l-discourse": 788,
1707
+ "SCONJ|l-mark": 789,
1708
+ "SCONJ|l-nsubj": 790,
1709
+ "SCONJ|l-orphan": 791,
1710
+ "SCONJ|r-advcl": 792,
1711
+ "SCONJ|r-compound": 793,
1712
+ "SCONJ|r-fixed": 794,
1713
+ "SCONJ|r-flat": 795,
1714
+ "SCONJ|r-mark": 796,
1715
+ "SCONJ|r-orphan": 797,
1716
+ "SCONJ|root": 798,
1717
+ "SYM": 799,
1718
+ "SYM.": 800,
1719
+ "SYM|_": 801,
1720
+ "SYM|l-dep": 802,
1721
+ "SYM|r-clf": 803,
1722
+ "SYM|r-nmod": 804,
1723
+ "SYM|r-obj": 805,
1724
+ "SYM|r-obl": 806,
1725
+ "SYM|r-xcomp": 807,
1726
+ "VERB": 808,
1727
+ "VERB.": 809,
1728
+ "VERB|Abbr=Yes|_": 810,
1729
+ "VERB|Abbr=Yes|r-acl": 811,
1730
+ "VERB|Foreign=Yes|_": 812,
1731
+ "VERB|Foreign=Yes|l-nsubj": 813,
1732
+ "VERB|Foreign=Yes|r-acl": 814,
1733
+ "VERB|Foreign=Yes|r-advcl": 815,
1734
+ "VERB|Foreign=Yes|r-ccomp": 816,
1735
+ "VERB|Foreign=Yes|r-compound": 817,
1736
+ "VERB|Foreign=Yes|r-conj": 818,
1737
+ "VERB|Foreign=Yes|r-flat": 819,
1738
+ "VERB|Foreign=Yes|r-nmod": 820,
1739
+ "VERB|Foreign=Yes|r-xcomp": 821,
1740
+ "VERB|Foreign=Yes|root": 822,
1741
+ "VERB|NounType=Class|_": 823,
1742
+ "VERB|NounType=Class|r-acl": 824,
1743
+ "VERB|NounType=Class|r-compound": 825,
1744
+ "VERB|PartType=Adj|_": 826,
1745
+ "VERB|PartType=Adj|r-acl": 827,
1746
+ "VERB|Prefix=Yes|_": 828,
1747
+ "VERB|Prefix=Yes|l-acl": 829,
1748
+ "VERB|Prefix=Yes|l-nsubj": 830,
1749
+ "VERB|Prefix=Yes|r-acl": 831,
1750
+ "VERB|Prefix=Yes|r-advcl": 832,
1751
+ "VERB|Prefix=Yes|r-ccomp": 833,
1752
+ "VERB|Prefix=Yes|r-compound": 834,
1753
+ "VERB|Prefix=Yes|r-conj": 835,
1754
+ "VERB|Prefix=Yes|r-parataxis": 836,
1755
+ "VERB|Prefix=Yes|root": 837,
1756
+ "VERB|VerbType=Cop|_": 838,
1757
+ "VERB|VerbType=Cop|l-advmod": 839,
1758
+ "VERB|VerbType=Cop|l-cop": 840,
1759
+ "VERB|VerbType=Cop|r-acl": 841,
1760
+ "VERB|VerbType=Cop|r-advcl": 842,
1761
+ "VERB|VerbType=Cop|r-ccomp": 843,
1762
+ "VERB|VerbType=Cop|r-compound": 844,
1763
+ "VERB|VerbType=Cop|r-parataxis": 845,
1764
+ "VERB|VerbType=Cop|root": 846,
1765
+ "VERB|_": 847,
1766
+ "VERB|l-acl": 848,
1767
+ "VERB|l-advcl": 849,
1768
+ "VERB|l-advmod": 850,
1769
+ "VERB|l-case": 851,
1770
+ "VERB|l-cc": 852,
1771
+ "VERB|l-ccomp": 853,
1772
+ "VERB|l-compound": 854,
1773
+ "VERB|l-conj": 855,
1774
+ "VERB|l-cop": 856,
1775
+ "VERB|l-csubj": 857,
1776
+ "VERB|l-discourse": 858,
1777
+ "VERB|l-dislocated": 859,
1778
+ "VERB|l-mark": 860,
1779
+ "VERB|l-nsubj": 861,
1780
+ "VERB|l-obl": 862,
1781
+ "VERB|l-orphan": 863,
1782
+ "VERB|l-xcomp": 864,
1783
+ "VERB|r-acl": 865,
1784
+ "VERB|r-advcl": 866,
1785
+ "VERB|r-advmod": 867,
1786
+ "VERB|r-appos": 868,
1787
+ "VERB|r-case": 869,
1788
+ "VERB|r-cc": 870,
1789
+ "VERB|r-ccomp": 871,
1790
+ "VERB|r-clf": 872,
1791
+ "VERB|r-compound": 873,
1792
+ "VERB|r-conj": 874,
1793
+ "VERB|r-dep": 875,
1794
+ "VERB|r-det": 876,
1795
+ "VERB|r-discourse": 877,
1796
+ "VERB|r-fixed": 878,
1797
+ "VERB|r-flat": 879,
1798
+ "VERB|r-list": 880,
1799
+ "VERB|r-mark": 881,
1800
+ "VERB|r-nmod": 882,
1801
+ "VERB|r-nsubj": 883,
1802
+ "VERB|r-obj": 884,
1803
+ "VERB|r-obl": 885,
1804
+ "VERB|r-orphan": 886,
1805
+ "VERB|r-parataxis": 887,
1806
+ "VERB|r-punct": 888,
1807
+ "VERB|r-xcomp": 889,
1808
+ "VERB|root": 890
1809
+ },
1810
+ "max_position_embeddings": 131072,
1811
+ "mlp_bias": false,
1812
+ "model_type": "llama",
1813
+ "num_attention_heads": 32,
1814
+ "num_hidden_layers": 16,
1815
+ "num_key_value_heads": 8,
1816
+ "pretraining_tp": 1,
1817
+ "rms_norm_eps": 1e-05,
1818
+ "rope_scaling": {
1819
+ "factor": 32.0,
1820
+ "high_freq_factor": 4.0,
1821
+ "low_freq_factor": 1.0,
1822
+ "original_max_position_embeddings": 8192,
1823
+ "rope_type": "llama3"
1824
+ },
1825
+ "rope_theta": 500000.0,
1826
+ "tie_word_embeddings": true,
1827
+ "tokenizer_class": "PreTrainedTokenizerFast",
1828
+ "torch_dtype": "float32",
1829
+ "transformers_version": "4.48.3",
1830
+ "use_cache": false,
1831
+ "vocab_size": 128256
1832
+ }
maker.py ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #! /usr/bin/python3
2
+ src="scb10x/llama3.2-typhoon2-1b"
3
+ tgt="KoichiYasuoka/llama3.2-typhoon2-1b-ud-embeds"
4
+ url="https://github.com/KoichiYasuoka/spaCy-Thai"
5
+ import os
6
+ d=os.path.basename(url)
7
+ os.system("test -d "+d+" || git clone --depth=1 "+url)
8
+ os.system("for F in train dev test ; do cp "+d+"/UD_Thai-Corpora/th_tud-ud-$F.conllu $F.conllu ; done")
9
+ class UDEmbedsDataset(object):
10
+ def __init__(self,conllu,tokenizer,oldtokenizer=None,embeddings=None):
11
+ self.conllu=open(conllu,"r",encoding="utf-8")
12
+ self.tokenizer=tokenizer
13
+ self.oldtokenizer=oldtokenizer if oldtokenizer else tokenizer
14
+ self.embeddings=embeddings
15
+ self.seeks=[0]
16
+ label=set(["SYM","SYM.","SYM|_"])
17
+ dep=set()
18
+ s=self.conllu.readline()
19
+ while s!="":
20
+ if s=="\n":
21
+ self.seeks.append(self.conllu.tell())
22
+ else:
23
+ w=s.split("\t")
24
+ if len(w)==10:
25
+ if w[0].isdecimal():
26
+ p=w[3]
27
+ q="" if w[5]=="_" else "|"+w[5]
28
+ d=("|" if w[6]=="0" else "|l-" if int(w[0])<int(w[6]) else "|r-")+w[7]
29
+ for k in [p,p+".","B-"+p,"B-"+p+".","I-"+p,"I-"+p+".",p+q+"|_",p+q+d]:
30
+ label.add(k)
31
+ s=self.conllu.readline()
32
+ self.label2id={l:i for i,l in enumerate(sorted(label))}
33
+ def __call__(*args):
34
+ lid={l:i for i,l in enumerate(sorted(set(sum([list(t.label2id) for t in args],[]))))}
35
+ for t in args:
36
+ t.label2id=lid
37
+ return lid
38
+ def __del__(self):
39
+ self.conllu.close()
40
+ __len__=lambda self:len(self.seeks)-1
41
+ def __getitem__(self,i):
42
+ import torch
43
+ self.conllu.seek(self.seeks[i])
44
+ c,t,s=[],[""],False
45
+ while t[0]!="\n":
46
+ t=self.conllu.readline().split("\t")
47
+ if len(t)==10 and t[0].isdecimal():
48
+ if s:
49
+ t[1]=" "+t[1]
50
+ c.append(t)
51
+ s=t[9].find("SpaceAfter=No")<0
52
+ x=[True if t[6]=="0" or int(t[6])>j or sum([1 if int(c[i][6])==j+1 else 0 for i in range(j+1,len(c))])>0 else False for j,t in enumerate(c)]
53
+ v=self.tokenizer([t[1] for t in c],add_special_tokens=False)["input_ids"]
54
+ ids,upos=[self.tokenizer.bos_token_id],["SYM."]
55
+ for i,(j,k) in enumerate(zip(v,c)):
56
+ if j==[]:
57
+ j=[self.tokenizer.unk_token_id]
58
+ p=k[3] if x[i] else k[3]+"."
59
+ ids+=j
60
+ upos+=[p] if len(j)==1 else ["B-"+p]+["I-"+p]*(len(j)-1)
61
+ x=[True if t[6]=="0" or int(t[6])>j or sum([1 if int(c[i][6])==j+1 else 0 for i in range(j+1,len(c))])>0 else False for j,t in enumerate(c)]
62
+ if len(x)<88:
63
+ x=[True]*len(x)
64
+ w=(len(x)+1)*(len(x)+2)/2+len(ids)
65
+ else:
66
+ w=sum([len(x)-i+1 if b else 0 for i,b in enumerate(x)])+len(ids)+1
67
+ for i in range(len(x)):
68
+ if x[i]==False and w+len(x)-i<4096:
69
+ x[i]=True
70
+ w+=len(x)-i+1
71
+ v=self.oldtokenizer([t[1] for t in c],add_special_tokens=False)["input_ids"]
72
+ p=[t[3] if t[5]=="_" else t[3]+"|"+t[5] for i,t in enumerate(c)]
73
+ d=[t[7] if t[6]=="0" else "l-"+t[7] if int(t[0])<int(t[6]) else "r-"+t[7] for t in c]
74
+ idx=[-1]
75
+ upos.append("SYM|_")
76
+ for i in range(len(x)):
77
+ if x[i]:
78
+ idx.append(i)
79
+ upos.append(p[i]+"|"+d[i] if c[i][6]=="0" else p[i]+"|_")
80
+ for j in range(i+1,len(x)):
81
+ idx.append(j)
82
+ upos.append(p[j]+"|"+d[j] if int(c[j][6])==i+1 else p[i]+"|"+d[i] if int(c[i][6])==j+1 else p[j]+"|_")
83
+ if i>0 and w>4096:
84
+ while w>4096:
85
+ if upos[-1].endswith("|_"):
86
+ upos.pop(-1)
87
+ idx.pop(-1)
88
+ w-=1
89
+ else:
90
+ break
91
+ idx.append(-1)
92
+ upos.append("SYM|_")
93
+ with torch.no_grad():
94
+ m=[]
95
+ for j in v:
96
+ if j==[]:
97
+ j=[self.tokenizer.convert_tokens_to_ids("<|python_tag|>")]
98
+ m.append(self.embeddings[j,:].sum(axis=0))
99
+ m.append(self.embeddings[self.tokenizer.eos_token_id])
100
+ emb=torch.stack(m)
101
+ return{"inputs_embeds":torch.vstack((self.embeddings[ids,:],emb[idx,:])),"labels":[self.label2id[p] for p in upos]}
102
+ from transformers import AutoTokenizer,AutoConfig,AutoModelForTokenClassification,DefaultDataCollator,TrainingArguments,Trainer
103
+ from tokenizers.pre_tokenizers import Sequence,Split,Whitespace
104
+ from tokenizers import Regex
105
+ from copy import deepcopy
106
+ otk=AutoTokenizer.from_pretrained(src)
107
+ ntk=deepcopy(otk)
108
+ ntk.backend_tokenizer.pre_tokenizer=Sequence([Whitespace(),Split(Regex("[\u0e40-\u0e44]?[\u0e01-\u0e2e][\u0e30-\u0e3a\u0e45\u0e47-\u0e4e]*|."),"isolated"),otk.backend_tokenizer.pre_tokenizer])
109
+ trainDS=UDEmbedsDataset("train.conllu",ntk,otk)
110
+ devDS=UDEmbedsDataset("dev.conllu",ntk,otk)
111
+ testDS=UDEmbedsDataset("test.conllu",ntk,otk)
112
+ lid=trainDS(devDS,testDS)
113
+ cfg=AutoConfig.from_pretrained(src,num_labels=len(lid),label2id=lid,id2label={i:l for l,i in lid.items()},ignore_mismatched_sizes=True,trust_remote_code=True)
114
+ mdl=AutoModelForTokenClassification.from_pretrained(src,config=cfg,ignore_mismatched_sizes=True,trust_remote_code=True)
115
+ trainDS.embeddings=mdl.get_input_embeddings().weight
116
+ arg=TrainingArguments(num_train_epochs=3,per_device_train_batch_size=1,dataloader_pin_memory=False,output_dir=tgt,overwrite_output_dir=True,save_total_limit=2,learning_rate=5e-05,warmup_ratio=0.1,save_safetensors=False)
117
+ trn=Trainer(args=arg,data_collator=DefaultDataCollator(),model=mdl,train_dataset=trainDS)
118
+ trn.train()
119
+ trn.save_model(tgt)
120
+ otk.save_pretrained(tgt)
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:36264c29142b869354148d72bdaa3f30b0d208bdd962ef3a18dec7b2e5a8f080
3
+ size 4950610458
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|begin_of_text|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|eot_id|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<|end_of_text|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b9e4e7fb171f92fd137b777cc2714bf87d11576700a1dcd7a399e7bbe39537b
3
+ size 17209920
tokenizer_config.json ADDED
@@ -0,0 +1,2063 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "128000": {
4
+ "content": "<|begin_of_text|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "128001": {
12
+ "content": "<|end_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "128002": {
20
+ "content": "<|reserved_special_token_0|>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "128003": {
28
+ "content": "<|reserved_special_token_1|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128004": {
36
+ "content": "<|finetune_right_pad_id|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "128005": {
44
+ "content": "<|reserved_special_token_2|>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "128006": {
52
+ "content": "<|start_header_id|>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "128007": {
60
+ "content": "<|end_header_id|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "128008": {
68
+ "content": "<|eom_id|>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "128009": {
76
+ "content": "<|eot_id|>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "128010": {
84
+ "content": "<|python_tag|>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "128011": {
92
+ "content": "<|reserved_special_token_3|>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "128012": {
100
+ "content": "<|reserved_special_token_4|>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "128013": {
108
+ "content": "<|reserved_special_token_5|>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "128014": {
116
+ "content": "<|reserved_special_token_6|>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "128015": {
124
+ "content": "<|reserved_special_token_7|>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "128016": {
132
+ "content": "<|reserved_special_token_8|>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "128017": {
140
+ "content": "<|reserved_special_token_9|>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "128018": {
148
+ "content": "<|reserved_special_token_10|>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "128019": {
156
+ "content": "<|reserved_special_token_11|>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "128020": {
164
+ "content": "<|reserved_special_token_12|>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ },
171
+ "128021": {
172
+ "content": "<|reserved_special_token_13|>",
173
+ "lstrip": false,
174
+ "normalized": false,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": true
178
+ },
179
+ "128022": {
180
+ "content": "<|reserved_special_token_14|>",
181
+ "lstrip": false,
182
+ "normalized": false,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": true
186
+ },
187
+ "128023": {
188
+ "content": "<|reserved_special_token_15|>",
189
+ "lstrip": false,
190
+ "normalized": false,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": true
194
+ },
195
+ "128024": {
196
+ "content": "<|reserved_special_token_16|>",
197
+ "lstrip": false,
198
+ "normalized": false,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": true
202
+ },
203
+ "128025": {
204
+ "content": "<|reserved_special_token_17|>",
205
+ "lstrip": false,
206
+ "normalized": false,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": true
210
+ },
211
+ "128026": {
212
+ "content": "<|reserved_special_token_18|>",
213
+ "lstrip": false,
214
+ "normalized": false,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": true
218
+ },
219
+ "128027": {
220
+ "content": "<|reserved_special_token_19|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "128028": {
228
+ "content": "<|reserved_special_token_20|>",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "128029": {
236
+ "content": "<|reserved_special_token_21|>",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "128030": {
244
+ "content": "<|reserved_special_token_22|>",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "128031": {
252
+ "content": "<|reserved_special_token_23|>",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "128032": {
260
+ "content": "<|reserved_special_token_24|>",
261
+ "lstrip": false,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "128033": {
268
+ "content": "<|reserved_special_token_25|>",
269
+ "lstrip": false,
270
+ "normalized": false,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": true
274
+ },
275
+ "128034": {
276
+ "content": "<|reserved_special_token_26|>",
277
+ "lstrip": false,
278
+ "normalized": false,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": true
282
+ },
283
+ "128035": {
284
+ "content": "<|reserved_special_token_27|>",
285
+ "lstrip": false,
286
+ "normalized": false,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": true
290
+ },
291
+ "128036": {
292
+ "content": "<|reserved_special_token_28|>",
293
+ "lstrip": false,
294
+ "normalized": false,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": true
298
+ },
299
+ "128037": {
300
+ "content": "<|reserved_special_token_29|>",
301
+ "lstrip": false,
302
+ "normalized": false,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": true
306
+ },
307
+ "128038": {
308
+ "content": "<|reserved_special_token_30|>",
309
+ "lstrip": false,
310
+ "normalized": false,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": true
314
+ },
315
+ "128039": {
316
+ "content": "<|reserved_special_token_31|>",
317
+ "lstrip": false,
318
+ "normalized": false,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": true
322
+ },
323
+ "128040": {
324
+ "content": "<|reserved_special_token_32|>",
325
+ "lstrip": false,
326
+ "normalized": false,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": true
330
+ },
331
+ "128041": {
332
+ "content": "<|reserved_special_token_33|>",
333
+ "lstrip": false,
334
+ "normalized": false,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": true
338
+ },
339
+ "128042": {
340
+ "content": "<|reserved_special_token_34|>",
341
+ "lstrip": false,
342
+ "normalized": false,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": true
346
+ },
347
+ "128043": {
348
+ "content": "<|reserved_special_token_35|>",
349
+ "lstrip": false,
350
+ "normalized": false,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": true
354
+ },
355
+ "128044": {
356
+ "content": "<|reserved_special_token_36|>",
357
+ "lstrip": false,
358
+ "normalized": false,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": true
362
+ },
363
+ "128045": {
364
+ "content": "<|reserved_special_token_37|>",
365
+ "lstrip": false,
366
+ "normalized": false,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": true
370
+ },
371
+ "128046": {
372
+ "content": "<|reserved_special_token_38|>",
373
+ "lstrip": false,
374
+ "normalized": false,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": true
378
+ },
379
+ "128047": {
380
+ "content": "<|reserved_special_token_39|>",
381
+ "lstrip": false,
382
+ "normalized": false,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": true
386
+ },
387
+ "128048": {
388
+ "content": "<|reserved_special_token_40|>",
389
+ "lstrip": false,
390
+ "normalized": false,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": true
394
+ },
395
+ "128049": {
396
+ "content": "<|reserved_special_token_41|>",
397
+ "lstrip": false,
398
+ "normalized": false,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": true
402
+ },
403
+ "128050": {
404
+ "content": "<|reserved_special_token_42|>",
405
+ "lstrip": false,
406
+ "normalized": false,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": true
410
+ },
411
+ "128051": {
412
+ "content": "<|reserved_special_token_43|>",
413
+ "lstrip": false,
414
+ "normalized": false,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": true
418
+ },
419
+ "128052": {
420
+ "content": "<|reserved_special_token_44|>",
421
+ "lstrip": false,
422
+ "normalized": false,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": true
426
+ },
427
+ "128053": {
428
+ "content": "<|reserved_special_token_45|>",
429
+ "lstrip": false,
430
+ "normalized": false,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": true
434
+ },
435
+ "128054": {
436
+ "content": "<|reserved_special_token_46|>",
437
+ "lstrip": false,
438
+ "normalized": false,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": true
442
+ },
443
+ "128055": {
444
+ "content": "<|reserved_special_token_47|>",
445
+ "lstrip": false,
446
+ "normalized": false,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": true
450
+ },
451
+ "128056": {
452
+ "content": "<|reserved_special_token_48|>",
453
+ "lstrip": false,
454
+ "normalized": false,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": true
458
+ },
459
+ "128057": {
460
+ "content": "<|reserved_special_token_49|>",
461
+ "lstrip": false,
462
+ "normalized": false,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": true
466
+ },
467
+ "128058": {
468
+ "content": "<|reserved_special_token_50|>",
469
+ "lstrip": false,
470
+ "normalized": false,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": true
474
+ },
475
+ "128059": {
476
+ "content": "<|reserved_special_token_51|>",
477
+ "lstrip": false,
478
+ "normalized": false,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": true
482
+ },
483
+ "128060": {
484
+ "content": "<|reserved_special_token_52|>",
485
+ "lstrip": false,
486
+ "normalized": false,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": true
490
+ },
491
+ "128061": {
492
+ "content": "<|reserved_special_token_53|>",
493
+ "lstrip": false,
494
+ "normalized": false,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": true
498
+ },
499
+ "128062": {
500
+ "content": "<|reserved_special_token_54|>",
501
+ "lstrip": false,
502
+ "normalized": false,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": true
506
+ },
507
+ "128063": {
508
+ "content": "<|reserved_special_token_55|>",
509
+ "lstrip": false,
510
+ "normalized": false,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": true
514
+ },
515
+ "128064": {
516
+ "content": "<|reserved_special_token_56|>",
517
+ "lstrip": false,
518
+ "normalized": false,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": true
522
+ },
523
+ "128065": {
524
+ "content": "<|reserved_special_token_57|>",
525
+ "lstrip": false,
526
+ "normalized": false,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": true
530
+ },
531
+ "128066": {
532
+ "content": "<|reserved_special_token_58|>",
533
+ "lstrip": false,
534
+ "normalized": false,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": true
538
+ },
539
+ "128067": {
540
+ "content": "<|reserved_special_token_59|>",
541
+ "lstrip": false,
542
+ "normalized": false,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": true
546
+ },
547
+ "128068": {
548
+ "content": "<|reserved_special_token_60|>",
549
+ "lstrip": false,
550
+ "normalized": false,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": true
554
+ },
555
+ "128069": {
556
+ "content": "<|reserved_special_token_61|>",
557
+ "lstrip": false,
558
+ "normalized": false,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": true
562
+ },
563
+ "128070": {
564
+ "content": "<|reserved_special_token_62|>",
565
+ "lstrip": false,
566
+ "normalized": false,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": true
570
+ },
571
+ "128071": {
572
+ "content": "<|reserved_special_token_63|>",
573
+ "lstrip": false,
574
+ "normalized": false,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": true
578
+ },
579
+ "128072": {
580
+ "content": "<|reserved_special_token_64|>",
581
+ "lstrip": false,
582
+ "normalized": false,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": true
586
+ },
587
+ "128073": {
588
+ "content": "<|reserved_special_token_65|>",
589
+ "lstrip": false,
590
+ "normalized": false,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": true
594
+ },
595
+ "128074": {
596
+ "content": "<|reserved_special_token_66|>",
597
+ "lstrip": false,
598
+ "normalized": false,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": true
602
+ },
603
+ "128075": {
604
+ "content": "<|reserved_special_token_67|>",
605
+ "lstrip": false,
606
+ "normalized": false,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": true
610
+ },
611
+ "128076": {
612
+ "content": "<|reserved_special_token_68|>",
613
+ "lstrip": false,
614
+ "normalized": false,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": true
618
+ },
619
+ "128077": {
620
+ "content": "<|reserved_special_token_69|>",
621
+ "lstrip": false,
622
+ "normalized": false,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": true
626
+ },
627
+ "128078": {
628
+ "content": "<|reserved_special_token_70|>",
629
+ "lstrip": false,
630
+ "normalized": false,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": true
634
+ },
635
+ "128079": {
636
+ "content": "<|reserved_special_token_71|>",
637
+ "lstrip": false,
638
+ "normalized": false,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": true
642
+ },
643
+ "128080": {
644
+ "content": "<|reserved_special_token_72|>",
645
+ "lstrip": false,
646
+ "normalized": false,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": true
650
+ },
651
+ "128081": {
652
+ "content": "<|reserved_special_token_73|>",
653
+ "lstrip": false,
654
+ "normalized": false,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": true
658
+ },
659
+ "128082": {
660
+ "content": "<|reserved_special_token_74|>",
661
+ "lstrip": false,
662
+ "normalized": false,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": true
666
+ },
667
+ "128083": {
668
+ "content": "<|reserved_special_token_75|>",
669
+ "lstrip": false,
670
+ "normalized": false,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": true
674
+ },
675
+ "128084": {
676
+ "content": "<|reserved_special_token_76|>",
677
+ "lstrip": false,
678
+ "normalized": false,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": true
682
+ },
683
+ "128085": {
684
+ "content": "<|reserved_special_token_77|>",
685
+ "lstrip": false,
686
+ "normalized": false,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": true
690
+ },
691
+ "128086": {
692
+ "content": "<|reserved_special_token_78|>",
693
+ "lstrip": false,
694
+ "normalized": false,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": true
698
+ },
699
+ "128087": {
700
+ "content": "<|reserved_special_token_79|>",
701
+ "lstrip": false,
702
+ "normalized": false,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": true
706
+ },
707
+ "128088": {
708
+ "content": "<|reserved_special_token_80|>",
709
+ "lstrip": false,
710
+ "normalized": false,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": true
714
+ },
715
+ "128089": {
716
+ "content": "<|reserved_special_token_81|>",
717
+ "lstrip": false,
718
+ "normalized": false,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": true
722
+ },
723
+ "128090": {
724
+ "content": "<|reserved_special_token_82|>",
725
+ "lstrip": false,
726
+ "normalized": false,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": true
730
+ },
731
+ "128091": {
732
+ "content": "<|reserved_special_token_83|>",
733
+ "lstrip": false,
734
+ "normalized": false,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": true
738
+ },
739
+ "128092": {
740
+ "content": "<|reserved_special_token_84|>",
741
+ "lstrip": false,
742
+ "normalized": false,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": true
746
+ },
747
+ "128093": {
748
+ "content": "<|reserved_special_token_85|>",
749
+ "lstrip": false,
750
+ "normalized": false,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": true
754
+ },
755
+ "128094": {
756
+ "content": "<|reserved_special_token_86|>",
757
+ "lstrip": false,
758
+ "normalized": false,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": true
762
+ },
763
+ "128095": {
764
+ "content": "<|reserved_special_token_87|>",
765
+ "lstrip": false,
766
+ "normalized": false,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": true
770
+ },
771
+ "128096": {
772
+ "content": "<|reserved_special_token_88|>",
773
+ "lstrip": false,
774
+ "normalized": false,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": true
778
+ },
779
+ "128097": {
780
+ "content": "<|reserved_special_token_89|>",
781
+ "lstrip": false,
782
+ "normalized": false,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": true
786
+ },
787
+ "128098": {
788
+ "content": "<|reserved_special_token_90|>",
789
+ "lstrip": false,
790
+ "normalized": false,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": true
794
+ },
795
+ "128099": {
796
+ "content": "<|reserved_special_token_91|>",
797
+ "lstrip": false,
798
+ "normalized": false,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": true
802
+ },
803
+ "128100": {
804
+ "content": "<|reserved_special_token_92|>",
805
+ "lstrip": false,
806
+ "normalized": false,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": true
810
+ },
811
+ "128101": {
812
+ "content": "<|reserved_special_token_93|>",
813
+ "lstrip": false,
814
+ "normalized": false,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": true
818
+ },
819
+ "128102": {
820
+ "content": "<|reserved_special_token_94|>",
821
+ "lstrip": false,
822
+ "normalized": false,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": true
826
+ },
827
+ "128103": {
828
+ "content": "<|reserved_special_token_95|>",
829
+ "lstrip": false,
830
+ "normalized": false,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": true
834
+ },
835
+ "128104": {
836
+ "content": "<|reserved_special_token_96|>",
837
+ "lstrip": false,
838
+ "normalized": false,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": true
842
+ },
843
+ "128105": {
844
+ "content": "<|reserved_special_token_97|>",
845
+ "lstrip": false,
846
+ "normalized": false,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": true
850
+ },
851
+ "128106": {
852
+ "content": "<|reserved_special_token_98|>",
853
+ "lstrip": false,
854
+ "normalized": false,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": true
858
+ },
859
+ "128107": {
860
+ "content": "<|reserved_special_token_99|>",
861
+ "lstrip": false,
862
+ "normalized": false,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": true
866
+ },
867
+ "128108": {
868
+ "content": "<|reserved_special_token_100|>",
869
+ "lstrip": false,
870
+ "normalized": false,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": true
874
+ },
875
+ "128109": {
876
+ "content": "<|reserved_special_token_101|>",
877
+ "lstrip": false,
878
+ "normalized": false,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": true
882
+ },
883
+ "128110": {
884
+ "content": "<|reserved_special_token_102|>",
885
+ "lstrip": false,
886
+ "normalized": false,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": true
890
+ },
891
+ "128111": {
892
+ "content": "<|reserved_special_token_103|>",
893
+ "lstrip": false,
894
+ "normalized": false,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": true
898
+ },
899
+ "128112": {
900
+ "content": "<|reserved_special_token_104|>",
901
+ "lstrip": false,
902
+ "normalized": false,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": true
906
+ },
907
+ "128113": {
908
+ "content": "<|reserved_special_token_105|>",
909
+ "lstrip": false,
910
+ "normalized": false,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": true
914
+ },
915
+ "128114": {
916
+ "content": "<|reserved_special_token_106|>",
917
+ "lstrip": false,
918
+ "normalized": false,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": true
922
+ },
923
+ "128115": {
924
+ "content": "<|reserved_special_token_107|>",
925
+ "lstrip": false,
926
+ "normalized": false,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": true
930
+ },
931
+ "128116": {
932
+ "content": "<|reserved_special_token_108|>",
933
+ "lstrip": false,
934
+ "normalized": false,
935
+ "rstrip": false,
936
+ "single_word": false,
937
+ "special": true
938
+ },
939
+ "128117": {
940
+ "content": "<|reserved_special_token_109|>",
941
+ "lstrip": false,
942
+ "normalized": false,
943
+ "rstrip": false,
944
+ "single_word": false,
945
+ "special": true
946
+ },
947
+ "128118": {
948
+ "content": "<|reserved_special_token_110|>",
949
+ "lstrip": false,
950
+ "normalized": false,
951
+ "rstrip": false,
952
+ "single_word": false,
953
+ "special": true
954
+ },
955
+ "128119": {
956
+ "content": "<|reserved_special_token_111|>",
957
+ "lstrip": false,
958
+ "normalized": false,
959
+ "rstrip": false,
960
+ "single_word": false,
961
+ "special": true
962
+ },
963
+ "128120": {
964
+ "content": "<|reserved_special_token_112|>",
965
+ "lstrip": false,
966
+ "normalized": false,
967
+ "rstrip": false,
968
+ "single_word": false,
969
+ "special": true
970
+ },
971
+ "128121": {
972
+ "content": "<|reserved_special_token_113|>",
973
+ "lstrip": false,
974
+ "normalized": false,
975
+ "rstrip": false,
976
+ "single_word": false,
977
+ "special": true
978
+ },
979
+ "128122": {
980
+ "content": "<|reserved_special_token_114|>",
981
+ "lstrip": false,
982
+ "normalized": false,
983
+ "rstrip": false,
984
+ "single_word": false,
985
+ "special": true
986
+ },
987
+ "128123": {
988
+ "content": "<|reserved_special_token_115|>",
989
+ "lstrip": false,
990
+ "normalized": false,
991
+ "rstrip": false,
992
+ "single_word": false,
993
+ "special": true
994
+ },
995
+ "128124": {
996
+ "content": "<|reserved_special_token_116|>",
997
+ "lstrip": false,
998
+ "normalized": false,
999
+ "rstrip": false,
1000
+ "single_word": false,
1001
+ "special": true
1002
+ },
1003
+ "128125": {
1004
+ "content": "<|reserved_special_token_117|>",
1005
+ "lstrip": false,
1006
+ "normalized": false,
1007
+ "rstrip": false,
1008
+ "single_word": false,
1009
+ "special": true
1010
+ },
1011
+ "128126": {
1012
+ "content": "<|reserved_special_token_118|>",
1013
+ "lstrip": false,
1014
+ "normalized": false,
1015
+ "rstrip": false,
1016
+ "single_word": false,
1017
+ "special": true
1018
+ },
1019
+ "128127": {
1020
+ "content": "<|reserved_special_token_119|>",
1021
+ "lstrip": false,
1022
+ "normalized": false,
1023
+ "rstrip": false,
1024
+ "single_word": false,
1025
+ "special": true
1026
+ },
1027
+ "128128": {
1028
+ "content": "<|reserved_special_token_120|>",
1029
+ "lstrip": false,
1030
+ "normalized": false,
1031
+ "rstrip": false,
1032
+ "single_word": false,
1033
+ "special": true
1034
+ },
1035
+ "128129": {
1036
+ "content": "<|reserved_special_token_121|>",
1037
+ "lstrip": false,
1038
+ "normalized": false,
1039
+ "rstrip": false,
1040
+ "single_word": false,
1041
+ "special": true
1042
+ },
1043
+ "128130": {
1044
+ "content": "<|reserved_special_token_122|>",
1045
+ "lstrip": false,
1046
+ "normalized": false,
1047
+ "rstrip": false,
1048
+ "single_word": false,
1049
+ "special": true
1050
+ },
1051
+ "128131": {
1052
+ "content": "<|reserved_special_token_123|>",
1053
+ "lstrip": false,
1054
+ "normalized": false,
1055
+ "rstrip": false,
1056
+ "single_word": false,
1057
+ "special": true
1058
+ },
1059
+ "128132": {
1060
+ "content": "<|reserved_special_token_124|>",
1061
+ "lstrip": false,
1062
+ "normalized": false,
1063
+ "rstrip": false,
1064
+ "single_word": false,
1065
+ "special": true
1066
+ },
1067
+ "128133": {
1068
+ "content": "<|reserved_special_token_125|>",
1069
+ "lstrip": false,
1070
+ "normalized": false,
1071
+ "rstrip": false,
1072
+ "single_word": false,
1073
+ "special": true
1074
+ },
1075
+ "128134": {
1076
+ "content": "<|reserved_special_token_126|>",
1077
+ "lstrip": false,
1078
+ "normalized": false,
1079
+ "rstrip": false,
1080
+ "single_word": false,
1081
+ "special": true
1082
+ },
1083
+ "128135": {
1084
+ "content": "<|reserved_special_token_127|>",
1085
+ "lstrip": false,
1086
+ "normalized": false,
1087
+ "rstrip": false,
1088
+ "single_word": false,
1089
+ "special": true
1090
+ },
1091
+ "128136": {
1092
+ "content": "<|reserved_special_token_128|>",
1093
+ "lstrip": false,
1094
+ "normalized": false,
1095
+ "rstrip": false,
1096
+ "single_word": false,
1097
+ "special": true
1098
+ },
1099
+ "128137": {
1100
+ "content": "<|reserved_special_token_129|>",
1101
+ "lstrip": false,
1102
+ "normalized": false,
1103
+ "rstrip": false,
1104
+ "single_word": false,
1105
+ "special": true
1106
+ },
1107
+ "128138": {
1108
+ "content": "<|reserved_special_token_130|>",
1109
+ "lstrip": false,
1110
+ "normalized": false,
1111
+ "rstrip": false,
1112
+ "single_word": false,
1113
+ "special": true
1114
+ },
1115
+ "128139": {
1116
+ "content": "<|reserved_special_token_131|>",
1117
+ "lstrip": false,
1118
+ "normalized": false,
1119
+ "rstrip": false,
1120
+ "single_word": false,
1121
+ "special": true
1122
+ },
1123
+ "128140": {
1124
+ "content": "<|reserved_special_token_132|>",
1125
+ "lstrip": false,
1126
+ "normalized": false,
1127
+ "rstrip": false,
1128
+ "single_word": false,
1129
+ "special": true
1130
+ },
1131
+ "128141": {
1132
+ "content": "<|reserved_special_token_133|>",
1133
+ "lstrip": false,
1134
+ "normalized": false,
1135
+ "rstrip": false,
1136
+ "single_word": false,
1137
+ "special": true
1138
+ },
1139
+ "128142": {
1140
+ "content": "<|reserved_special_token_134|>",
1141
+ "lstrip": false,
1142
+ "normalized": false,
1143
+ "rstrip": false,
1144
+ "single_word": false,
1145
+ "special": true
1146
+ },
1147
+ "128143": {
1148
+ "content": "<|reserved_special_token_135|>",
1149
+ "lstrip": false,
1150
+ "normalized": false,
1151
+ "rstrip": false,
1152
+ "single_word": false,
1153
+ "special": true
1154
+ },
1155
+ "128144": {
1156
+ "content": "<|reserved_special_token_136|>",
1157
+ "lstrip": false,
1158
+ "normalized": false,
1159
+ "rstrip": false,
1160
+ "single_word": false,
1161
+ "special": true
1162
+ },
1163
+ "128145": {
1164
+ "content": "<|reserved_special_token_137|>",
1165
+ "lstrip": false,
1166
+ "normalized": false,
1167
+ "rstrip": false,
1168
+ "single_word": false,
1169
+ "special": true
1170
+ },
1171
+ "128146": {
1172
+ "content": "<|reserved_special_token_138|>",
1173
+ "lstrip": false,
1174
+ "normalized": false,
1175
+ "rstrip": false,
1176
+ "single_word": false,
1177
+ "special": true
1178
+ },
1179
+ "128147": {
1180
+ "content": "<|reserved_special_token_139|>",
1181
+ "lstrip": false,
1182
+ "normalized": false,
1183
+ "rstrip": false,
1184
+ "single_word": false,
1185
+ "special": true
1186
+ },
1187
+ "128148": {
1188
+ "content": "<|reserved_special_token_140|>",
1189
+ "lstrip": false,
1190
+ "normalized": false,
1191
+ "rstrip": false,
1192
+ "single_word": false,
1193
+ "special": true
1194
+ },
1195
+ "128149": {
1196
+ "content": "<|reserved_special_token_141|>",
1197
+ "lstrip": false,
1198
+ "normalized": false,
1199
+ "rstrip": false,
1200
+ "single_word": false,
1201
+ "special": true
1202
+ },
1203
+ "128150": {
1204
+ "content": "<|reserved_special_token_142|>",
1205
+ "lstrip": false,
1206
+ "normalized": false,
1207
+ "rstrip": false,
1208
+ "single_word": false,
1209
+ "special": true
1210
+ },
1211
+ "128151": {
1212
+ "content": "<|reserved_special_token_143|>",
1213
+ "lstrip": false,
1214
+ "normalized": false,
1215
+ "rstrip": false,
1216
+ "single_word": false,
1217
+ "special": true
1218
+ },
1219
+ "128152": {
1220
+ "content": "<|reserved_special_token_144|>",
1221
+ "lstrip": false,
1222
+ "normalized": false,
1223
+ "rstrip": false,
1224
+ "single_word": false,
1225
+ "special": true
1226
+ },
1227
+ "128153": {
1228
+ "content": "<|reserved_special_token_145|>",
1229
+ "lstrip": false,
1230
+ "normalized": false,
1231
+ "rstrip": false,
1232
+ "single_word": false,
1233
+ "special": true
1234
+ },
1235
+ "128154": {
1236
+ "content": "<|reserved_special_token_146|>",
1237
+ "lstrip": false,
1238
+ "normalized": false,
1239
+ "rstrip": false,
1240
+ "single_word": false,
1241
+ "special": true
1242
+ },
1243
+ "128155": {
1244
+ "content": "<|reserved_special_token_147|>",
1245
+ "lstrip": false,
1246
+ "normalized": false,
1247
+ "rstrip": false,
1248
+ "single_word": false,
1249
+ "special": true
1250
+ },
1251
+ "128156": {
1252
+ "content": "<|reserved_special_token_148|>",
1253
+ "lstrip": false,
1254
+ "normalized": false,
1255
+ "rstrip": false,
1256
+ "single_word": false,
1257
+ "special": true
1258
+ },
1259
+ "128157": {
1260
+ "content": "<|reserved_special_token_149|>",
1261
+ "lstrip": false,
1262
+ "normalized": false,
1263
+ "rstrip": false,
1264
+ "single_word": false,
1265
+ "special": true
1266
+ },
1267
+ "128158": {
1268
+ "content": "<|reserved_special_token_150|>",
1269
+ "lstrip": false,
1270
+ "normalized": false,
1271
+ "rstrip": false,
1272
+ "single_word": false,
1273
+ "special": true
1274
+ },
1275
+ "128159": {
1276
+ "content": "<|reserved_special_token_151|>",
1277
+ "lstrip": false,
1278
+ "normalized": false,
1279
+ "rstrip": false,
1280
+ "single_word": false,
1281
+ "special": true
1282
+ },
1283
+ "128160": {
1284
+ "content": "<|reserved_special_token_152|>",
1285
+ "lstrip": false,
1286
+ "normalized": false,
1287
+ "rstrip": false,
1288
+ "single_word": false,
1289
+ "special": true
1290
+ },
1291
+ "128161": {
1292
+ "content": "<|reserved_special_token_153|>",
1293
+ "lstrip": false,
1294
+ "normalized": false,
1295
+ "rstrip": false,
1296
+ "single_word": false,
1297
+ "special": true
1298
+ },
1299
+ "128162": {
1300
+ "content": "<|reserved_special_token_154|>",
1301
+ "lstrip": false,
1302
+ "normalized": false,
1303
+ "rstrip": false,
1304
+ "single_word": false,
1305
+ "special": true
1306
+ },
1307
+ "128163": {
1308
+ "content": "<|reserved_special_token_155|>",
1309
+ "lstrip": false,
1310
+ "normalized": false,
1311
+ "rstrip": false,
1312
+ "single_word": false,
1313
+ "special": true
1314
+ },
1315
+ "128164": {
1316
+ "content": "<|reserved_special_token_156|>",
1317
+ "lstrip": false,
1318
+ "normalized": false,
1319
+ "rstrip": false,
1320
+ "single_word": false,
1321
+ "special": true
1322
+ },
1323
+ "128165": {
1324
+ "content": "<|reserved_special_token_157|>",
1325
+ "lstrip": false,
1326
+ "normalized": false,
1327
+ "rstrip": false,
1328
+ "single_word": false,
1329
+ "special": true
1330
+ },
1331
+ "128166": {
1332
+ "content": "<|reserved_special_token_158|>",
1333
+ "lstrip": false,
1334
+ "normalized": false,
1335
+ "rstrip": false,
1336
+ "single_word": false,
1337
+ "special": true
1338
+ },
1339
+ "128167": {
1340
+ "content": "<|reserved_special_token_159|>",
1341
+ "lstrip": false,
1342
+ "normalized": false,
1343
+ "rstrip": false,
1344
+ "single_word": false,
1345
+ "special": true
1346
+ },
1347
+ "128168": {
1348
+ "content": "<|reserved_special_token_160|>",
1349
+ "lstrip": false,
1350
+ "normalized": false,
1351
+ "rstrip": false,
1352
+ "single_word": false,
1353
+ "special": true
1354
+ },
1355
+ "128169": {
1356
+ "content": "<|reserved_special_token_161|>",
1357
+ "lstrip": false,
1358
+ "normalized": false,
1359
+ "rstrip": false,
1360
+ "single_word": false,
1361
+ "special": true
1362
+ },
1363
+ "128170": {
1364
+ "content": "<|reserved_special_token_162|>",
1365
+ "lstrip": false,
1366
+ "normalized": false,
1367
+ "rstrip": false,
1368
+ "single_word": false,
1369
+ "special": true
1370
+ },
1371
+ "128171": {
1372
+ "content": "<|reserved_special_token_163|>",
1373
+ "lstrip": false,
1374
+ "normalized": false,
1375
+ "rstrip": false,
1376
+ "single_word": false,
1377
+ "special": true
1378
+ },
1379
+ "128172": {
1380
+ "content": "<|reserved_special_token_164|>",
1381
+ "lstrip": false,
1382
+ "normalized": false,
1383
+ "rstrip": false,
1384
+ "single_word": false,
1385
+ "special": true
1386
+ },
1387
+ "128173": {
1388
+ "content": "<|reserved_special_token_165|>",
1389
+ "lstrip": false,
1390
+ "normalized": false,
1391
+ "rstrip": false,
1392
+ "single_word": false,
1393
+ "special": true
1394
+ },
1395
+ "128174": {
1396
+ "content": "<|reserved_special_token_166|>",
1397
+ "lstrip": false,
1398
+ "normalized": false,
1399
+ "rstrip": false,
1400
+ "single_word": false,
1401
+ "special": true
1402
+ },
1403
+ "128175": {
1404
+ "content": "<|reserved_special_token_167|>",
1405
+ "lstrip": false,
1406
+ "normalized": false,
1407
+ "rstrip": false,
1408
+ "single_word": false,
1409
+ "special": true
1410
+ },
1411
+ "128176": {
1412
+ "content": "<|reserved_special_token_168|>",
1413
+ "lstrip": false,
1414
+ "normalized": false,
1415
+ "rstrip": false,
1416
+ "single_word": false,
1417
+ "special": true
1418
+ },
1419
+ "128177": {
1420
+ "content": "<|reserved_special_token_169|>",
1421
+ "lstrip": false,
1422
+ "normalized": false,
1423
+ "rstrip": false,
1424
+ "single_word": false,
1425
+ "special": true
1426
+ },
1427
+ "128178": {
1428
+ "content": "<|reserved_special_token_170|>",
1429
+ "lstrip": false,
1430
+ "normalized": false,
1431
+ "rstrip": false,
1432
+ "single_word": false,
1433
+ "special": true
1434
+ },
1435
+ "128179": {
1436
+ "content": "<|reserved_special_token_171|>",
1437
+ "lstrip": false,
1438
+ "normalized": false,
1439
+ "rstrip": false,
1440
+ "single_word": false,
1441
+ "special": true
1442
+ },
1443
+ "128180": {
1444
+ "content": "<|reserved_special_token_172|>",
1445
+ "lstrip": false,
1446
+ "normalized": false,
1447
+ "rstrip": false,
1448
+ "single_word": false,
1449
+ "special": true
1450
+ },
1451
+ "128181": {
1452
+ "content": "<|reserved_special_token_173|>",
1453
+ "lstrip": false,
1454
+ "normalized": false,
1455
+ "rstrip": false,
1456
+ "single_word": false,
1457
+ "special": true
1458
+ },
1459
+ "128182": {
1460
+ "content": "<|reserved_special_token_174|>",
1461
+ "lstrip": false,
1462
+ "normalized": false,
1463
+ "rstrip": false,
1464
+ "single_word": false,
1465
+ "special": true
1466
+ },
1467
+ "128183": {
1468
+ "content": "<|reserved_special_token_175|>",
1469
+ "lstrip": false,
1470
+ "normalized": false,
1471
+ "rstrip": false,
1472
+ "single_word": false,
1473
+ "special": true
1474
+ },
1475
+ "128184": {
1476
+ "content": "<|reserved_special_token_176|>",
1477
+ "lstrip": false,
1478
+ "normalized": false,
1479
+ "rstrip": false,
1480
+ "single_word": false,
1481
+ "special": true
1482
+ },
1483
+ "128185": {
1484
+ "content": "<|reserved_special_token_177|>",
1485
+ "lstrip": false,
1486
+ "normalized": false,
1487
+ "rstrip": false,
1488
+ "single_word": false,
1489
+ "special": true
1490
+ },
1491
+ "128186": {
1492
+ "content": "<|reserved_special_token_178|>",
1493
+ "lstrip": false,
1494
+ "normalized": false,
1495
+ "rstrip": false,
1496
+ "single_word": false,
1497
+ "special": true
1498
+ },
1499
+ "128187": {
1500
+ "content": "<|reserved_special_token_179|>",
1501
+ "lstrip": false,
1502
+ "normalized": false,
1503
+ "rstrip": false,
1504
+ "single_word": false,
1505
+ "special": true
1506
+ },
1507
+ "128188": {
1508
+ "content": "<|reserved_special_token_180|>",
1509
+ "lstrip": false,
1510
+ "normalized": false,
1511
+ "rstrip": false,
1512
+ "single_word": false,
1513
+ "special": true
1514
+ },
1515
+ "128189": {
1516
+ "content": "<|reserved_special_token_181|>",
1517
+ "lstrip": false,
1518
+ "normalized": false,
1519
+ "rstrip": false,
1520
+ "single_word": false,
1521
+ "special": true
1522
+ },
1523
+ "128190": {
1524
+ "content": "<|reserved_special_token_182|>",
1525
+ "lstrip": false,
1526
+ "normalized": false,
1527
+ "rstrip": false,
1528
+ "single_word": false,
1529
+ "special": true
1530
+ },
1531
+ "128191": {
1532
+ "content": "<|reserved_special_token_183|>",
1533
+ "lstrip": false,
1534
+ "normalized": false,
1535
+ "rstrip": false,
1536
+ "single_word": false,
1537
+ "special": true
1538
+ },
1539
+ "128192": {
1540
+ "content": "<|reserved_special_token_184|>",
1541
+ "lstrip": false,
1542
+ "normalized": false,
1543
+ "rstrip": false,
1544
+ "single_word": false,
1545
+ "special": true
1546
+ },
1547
+ "128193": {
1548
+ "content": "<|reserved_special_token_185|>",
1549
+ "lstrip": false,
1550
+ "normalized": false,
1551
+ "rstrip": false,
1552
+ "single_word": false,
1553
+ "special": true
1554
+ },
1555
+ "128194": {
1556
+ "content": "<|reserved_special_token_186|>",
1557
+ "lstrip": false,
1558
+ "normalized": false,
1559
+ "rstrip": false,
1560
+ "single_word": false,
1561
+ "special": true
1562
+ },
1563
+ "128195": {
1564
+ "content": "<|reserved_special_token_187|>",
1565
+ "lstrip": false,
1566
+ "normalized": false,
1567
+ "rstrip": false,
1568
+ "single_word": false,
1569
+ "special": true
1570
+ },
1571
+ "128196": {
1572
+ "content": "<|reserved_special_token_188|>",
1573
+ "lstrip": false,
1574
+ "normalized": false,
1575
+ "rstrip": false,
1576
+ "single_word": false,
1577
+ "special": true
1578
+ },
1579
+ "128197": {
1580
+ "content": "<|reserved_special_token_189|>",
1581
+ "lstrip": false,
1582
+ "normalized": false,
1583
+ "rstrip": false,
1584
+ "single_word": false,
1585
+ "special": true
1586
+ },
1587
+ "128198": {
1588
+ "content": "<|reserved_special_token_190|>",
1589
+ "lstrip": false,
1590
+ "normalized": false,
1591
+ "rstrip": false,
1592
+ "single_word": false,
1593
+ "special": true
1594
+ },
1595
+ "128199": {
1596
+ "content": "<|reserved_special_token_191|>",
1597
+ "lstrip": false,
1598
+ "normalized": false,
1599
+ "rstrip": false,
1600
+ "single_word": false,
1601
+ "special": true
1602
+ },
1603
+ "128200": {
1604
+ "content": "<|reserved_special_token_192|>",
1605
+ "lstrip": false,
1606
+ "normalized": false,
1607
+ "rstrip": false,
1608
+ "single_word": false,
1609
+ "special": true
1610
+ },
1611
+ "128201": {
1612
+ "content": "<|reserved_special_token_193|>",
1613
+ "lstrip": false,
1614
+ "normalized": false,
1615
+ "rstrip": false,
1616
+ "single_word": false,
1617
+ "special": true
1618
+ },
1619
+ "128202": {
1620
+ "content": "<|reserved_special_token_194|>",
1621
+ "lstrip": false,
1622
+ "normalized": false,
1623
+ "rstrip": false,
1624
+ "single_word": false,
1625
+ "special": true
1626
+ },
1627
+ "128203": {
1628
+ "content": "<|reserved_special_token_195|>",
1629
+ "lstrip": false,
1630
+ "normalized": false,
1631
+ "rstrip": false,
1632
+ "single_word": false,
1633
+ "special": true
1634
+ },
1635
+ "128204": {
1636
+ "content": "<|reserved_special_token_196|>",
1637
+ "lstrip": false,
1638
+ "normalized": false,
1639
+ "rstrip": false,
1640
+ "single_word": false,
1641
+ "special": true
1642
+ },
1643
+ "128205": {
1644
+ "content": "<|reserved_special_token_197|>",
1645
+ "lstrip": false,
1646
+ "normalized": false,
1647
+ "rstrip": false,
1648
+ "single_word": false,
1649
+ "special": true
1650
+ },
1651
+ "128206": {
1652
+ "content": "<|reserved_special_token_198|>",
1653
+ "lstrip": false,
1654
+ "normalized": false,
1655
+ "rstrip": false,
1656
+ "single_word": false,
1657
+ "special": true
1658
+ },
1659
+ "128207": {
1660
+ "content": "<|reserved_special_token_199|>",
1661
+ "lstrip": false,
1662
+ "normalized": false,
1663
+ "rstrip": false,
1664
+ "single_word": false,
1665
+ "special": true
1666
+ },
1667
+ "128208": {
1668
+ "content": "<|reserved_special_token_200|>",
1669
+ "lstrip": false,
1670
+ "normalized": false,
1671
+ "rstrip": false,
1672
+ "single_word": false,
1673
+ "special": true
1674
+ },
1675
+ "128209": {
1676
+ "content": "<|reserved_special_token_201|>",
1677
+ "lstrip": false,
1678
+ "normalized": false,
1679
+ "rstrip": false,
1680
+ "single_word": false,
1681
+ "special": true
1682
+ },
1683
+ "128210": {
1684
+ "content": "<|reserved_special_token_202|>",
1685
+ "lstrip": false,
1686
+ "normalized": false,
1687
+ "rstrip": false,
1688
+ "single_word": false,
1689
+ "special": true
1690
+ },
1691
+ "128211": {
1692
+ "content": "<|reserved_special_token_203|>",
1693
+ "lstrip": false,
1694
+ "normalized": false,
1695
+ "rstrip": false,
1696
+ "single_word": false,
1697
+ "special": true
1698
+ },
1699
+ "128212": {
1700
+ "content": "<|reserved_special_token_204|>",
1701
+ "lstrip": false,
1702
+ "normalized": false,
1703
+ "rstrip": false,
1704
+ "single_word": false,
1705
+ "special": true
1706
+ },
1707
+ "128213": {
1708
+ "content": "<|reserved_special_token_205|>",
1709
+ "lstrip": false,
1710
+ "normalized": false,
1711
+ "rstrip": false,
1712
+ "single_word": false,
1713
+ "special": true
1714
+ },
1715
+ "128214": {
1716
+ "content": "<|reserved_special_token_206|>",
1717
+ "lstrip": false,
1718
+ "normalized": false,
1719
+ "rstrip": false,
1720
+ "single_word": false,
1721
+ "special": true
1722
+ },
1723
+ "128215": {
1724
+ "content": "<|reserved_special_token_207|>",
1725
+ "lstrip": false,
1726
+ "normalized": false,
1727
+ "rstrip": false,
1728
+ "single_word": false,
1729
+ "special": true
1730
+ },
1731
+ "128216": {
1732
+ "content": "<|reserved_special_token_208|>",
1733
+ "lstrip": false,
1734
+ "normalized": false,
1735
+ "rstrip": false,
1736
+ "single_word": false,
1737
+ "special": true
1738
+ },
1739
+ "128217": {
1740
+ "content": "<|reserved_special_token_209|>",
1741
+ "lstrip": false,
1742
+ "normalized": false,
1743
+ "rstrip": false,
1744
+ "single_word": false,
1745
+ "special": true
1746
+ },
1747
+ "128218": {
1748
+ "content": "<|reserved_special_token_210|>",
1749
+ "lstrip": false,
1750
+ "normalized": false,
1751
+ "rstrip": false,
1752
+ "single_word": false,
1753
+ "special": true
1754
+ },
1755
+ "128219": {
1756
+ "content": "<|reserved_special_token_211|>",
1757
+ "lstrip": false,
1758
+ "normalized": false,
1759
+ "rstrip": false,
1760
+ "single_word": false,
1761
+ "special": true
1762
+ },
1763
+ "128220": {
1764
+ "content": "<|reserved_special_token_212|>",
1765
+ "lstrip": false,
1766
+ "normalized": false,
1767
+ "rstrip": false,
1768
+ "single_word": false,
1769
+ "special": true
1770
+ },
1771
+ "128221": {
1772
+ "content": "<|reserved_special_token_213|>",
1773
+ "lstrip": false,
1774
+ "normalized": false,
1775
+ "rstrip": false,
1776
+ "single_word": false,
1777
+ "special": true
1778
+ },
1779
+ "128222": {
1780
+ "content": "<|reserved_special_token_214|>",
1781
+ "lstrip": false,
1782
+ "normalized": false,
1783
+ "rstrip": false,
1784
+ "single_word": false,
1785
+ "special": true
1786
+ },
1787
+ "128223": {
1788
+ "content": "<|reserved_special_token_215|>",
1789
+ "lstrip": false,
1790
+ "normalized": false,
1791
+ "rstrip": false,
1792
+ "single_word": false,
1793
+ "special": true
1794
+ },
1795
+ "128224": {
1796
+ "content": "<|reserved_special_token_216|>",
1797
+ "lstrip": false,
1798
+ "normalized": false,
1799
+ "rstrip": false,
1800
+ "single_word": false,
1801
+ "special": true
1802
+ },
1803
+ "128225": {
1804
+ "content": "<|reserved_special_token_217|>",
1805
+ "lstrip": false,
1806
+ "normalized": false,
1807
+ "rstrip": false,
1808
+ "single_word": false,
1809
+ "special": true
1810
+ },
1811
+ "128226": {
1812
+ "content": "<|reserved_special_token_218|>",
1813
+ "lstrip": false,
1814
+ "normalized": false,
1815
+ "rstrip": false,
1816
+ "single_word": false,
1817
+ "special": true
1818
+ },
1819
+ "128227": {
1820
+ "content": "<|reserved_special_token_219|>",
1821
+ "lstrip": false,
1822
+ "normalized": false,
1823
+ "rstrip": false,
1824
+ "single_word": false,
1825
+ "special": true
1826
+ },
1827
+ "128228": {
1828
+ "content": "<|reserved_special_token_220|>",
1829
+ "lstrip": false,
1830
+ "normalized": false,
1831
+ "rstrip": false,
1832
+ "single_word": false,
1833
+ "special": true
1834
+ },
1835
+ "128229": {
1836
+ "content": "<|reserved_special_token_221|>",
1837
+ "lstrip": false,
1838
+ "normalized": false,
1839
+ "rstrip": false,
1840
+ "single_word": false,
1841
+ "special": true
1842
+ },
1843
+ "128230": {
1844
+ "content": "<|reserved_special_token_222|>",
1845
+ "lstrip": false,
1846
+ "normalized": false,
1847
+ "rstrip": false,
1848
+ "single_word": false,
1849
+ "special": true
1850
+ },
1851
+ "128231": {
1852
+ "content": "<|reserved_special_token_223|>",
1853
+ "lstrip": false,
1854
+ "normalized": false,
1855
+ "rstrip": false,
1856
+ "single_word": false,
1857
+ "special": true
1858
+ },
1859
+ "128232": {
1860
+ "content": "<|reserved_special_token_224|>",
1861
+ "lstrip": false,
1862
+ "normalized": false,
1863
+ "rstrip": false,
1864
+ "single_word": false,
1865
+ "special": true
1866
+ },
1867
+ "128233": {
1868
+ "content": "<|reserved_special_token_225|>",
1869
+ "lstrip": false,
1870
+ "normalized": false,
1871
+ "rstrip": false,
1872
+ "single_word": false,
1873
+ "special": true
1874
+ },
1875
+ "128234": {
1876
+ "content": "<|reserved_special_token_226|>",
1877
+ "lstrip": false,
1878
+ "normalized": false,
1879
+ "rstrip": false,
1880
+ "single_word": false,
1881
+ "special": true
1882
+ },
1883
+ "128235": {
1884
+ "content": "<|reserved_special_token_227|>",
1885
+ "lstrip": false,
1886
+ "normalized": false,
1887
+ "rstrip": false,
1888
+ "single_word": false,
1889
+ "special": true
1890
+ },
1891
+ "128236": {
1892
+ "content": "<|reserved_special_token_228|>",
1893
+ "lstrip": false,
1894
+ "normalized": false,
1895
+ "rstrip": false,
1896
+ "single_word": false,
1897
+ "special": true
1898
+ },
1899
+ "128237": {
1900
+ "content": "<|reserved_special_token_229|>",
1901
+ "lstrip": false,
1902
+ "normalized": false,
1903
+ "rstrip": false,
1904
+ "single_word": false,
1905
+ "special": true
1906
+ },
1907
+ "128238": {
1908
+ "content": "<|reserved_special_token_230|>",
1909
+ "lstrip": false,
1910
+ "normalized": false,
1911
+ "rstrip": false,
1912
+ "single_word": false,
1913
+ "special": true
1914
+ },
1915
+ "128239": {
1916
+ "content": "<|reserved_special_token_231|>",
1917
+ "lstrip": false,
1918
+ "normalized": false,
1919
+ "rstrip": false,
1920
+ "single_word": false,
1921
+ "special": true
1922
+ },
1923
+ "128240": {
1924
+ "content": "<|reserved_special_token_232|>",
1925
+ "lstrip": false,
1926
+ "normalized": false,
1927
+ "rstrip": false,
1928
+ "single_word": false,
1929
+ "special": true
1930
+ },
1931
+ "128241": {
1932
+ "content": "<|reserved_special_token_233|>",
1933
+ "lstrip": false,
1934
+ "normalized": false,
1935
+ "rstrip": false,
1936
+ "single_word": false,
1937
+ "special": true
1938
+ },
1939
+ "128242": {
1940
+ "content": "<|reserved_special_token_234|>",
1941
+ "lstrip": false,
1942
+ "normalized": false,
1943
+ "rstrip": false,
1944
+ "single_word": false,
1945
+ "special": true
1946
+ },
1947
+ "128243": {
1948
+ "content": "<|reserved_special_token_235|>",
1949
+ "lstrip": false,
1950
+ "normalized": false,
1951
+ "rstrip": false,
1952
+ "single_word": false,
1953
+ "special": true
1954
+ },
1955
+ "128244": {
1956
+ "content": "<|reserved_special_token_236|>",
1957
+ "lstrip": false,
1958
+ "normalized": false,
1959
+ "rstrip": false,
1960
+ "single_word": false,
1961
+ "special": true
1962
+ },
1963
+ "128245": {
1964
+ "content": "<|reserved_special_token_237|>",
1965
+ "lstrip": false,
1966
+ "normalized": false,
1967
+ "rstrip": false,
1968
+ "single_word": false,
1969
+ "special": true
1970
+ },
1971
+ "128246": {
1972
+ "content": "<|reserved_special_token_238|>",
1973
+ "lstrip": false,
1974
+ "normalized": false,
1975
+ "rstrip": false,
1976
+ "single_word": false,
1977
+ "special": true
1978
+ },
1979
+ "128247": {
1980
+ "content": "<|reserved_special_token_239|>",
1981
+ "lstrip": false,
1982
+ "normalized": false,
1983
+ "rstrip": false,
1984
+ "single_word": false,
1985
+ "special": true
1986
+ },
1987
+ "128248": {
1988
+ "content": "<|reserved_special_token_240|>",
1989
+ "lstrip": false,
1990
+ "normalized": false,
1991
+ "rstrip": false,
1992
+ "single_word": false,
1993
+ "special": true
1994
+ },
1995
+ "128249": {
1996
+ "content": "<|reserved_special_token_241|>",
1997
+ "lstrip": false,
1998
+ "normalized": false,
1999
+ "rstrip": false,
2000
+ "single_word": false,
2001
+ "special": true
2002
+ },
2003
+ "128250": {
2004
+ "content": "<|reserved_special_token_242|>",
2005
+ "lstrip": false,
2006
+ "normalized": false,
2007
+ "rstrip": false,
2008
+ "single_word": false,
2009
+ "special": true
2010
+ },
2011
+ "128251": {
2012
+ "content": "<|reserved_special_token_243|>",
2013
+ "lstrip": false,
2014
+ "normalized": false,
2015
+ "rstrip": false,
2016
+ "single_word": false,
2017
+ "special": true
2018
+ },
2019
+ "128252": {
2020
+ "content": "<|reserved_special_token_244|>",
2021
+ "lstrip": false,
2022
+ "normalized": false,
2023
+ "rstrip": false,
2024
+ "single_word": false,
2025
+ "special": true
2026
+ },
2027
+ "128253": {
2028
+ "content": "<|reserved_special_token_245|>",
2029
+ "lstrip": false,
2030
+ "normalized": false,
2031
+ "rstrip": false,
2032
+ "single_word": false,
2033
+ "special": true
2034
+ },
2035
+ "128254": {
2036
+ "content": "<|reserved_special_token_246|>",
2037
+ "lstrip": false,
2038
+ "normalized": false,
2039
+ "rstrip": false,
2040
+ "single_word": false,
2041
+ "special": true
2042
+ },
2043
+ "128255": {
2044
+ "content": "<|reserved_special_token_247|>",
2045
+ "lstrip": false,
2046
+ "normalized": false,
2047
+ "rstrip": false,
2048
+ "single_word": false,
2049
+ "special": true
2050
+ }
2051
+ },
2052
+ "bos_token": "<|begin_of_text|>",
2053
+ "clean_up_tokenization_spaces": true,
2054
+ "eos_token": "<|eot_id|>",
2055
+ "extra_special_tokens": {},
2056
+ "model_input_names": [
2057
+ "input_ids",
2058
+ "attention_mask"
2059
+ ],
2060
+ "model_max_length": 131072,
2061
+ "pad_token": "<|end_of_text|>",
2062
+ "tokenizer_class": "PreTrainedTokenizerFast"
2063
+ }
ud.py ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy
2
+ from transformers import TokenClassificationPipeline
3
+
4
+ class BellmanFordTokenClassificationPipeline(TokenClassificationPipeline):
5
+ def __init__(self,**kwargs):
6
+ from copy import deepcopy
7
+ from tokenizers import Regex
8
+ from tokenizers.pre_tokenizers import Sequence,Split,Whitespace
9
+ super().__init__(**kwargs)
10
+ self.oldtokenizer=deepcopy(self.tokenizer)
11
+ self.tokenizer.backend_tokenizer.pre_tokenizer=Sequence([Whitespace(),Split(Regex("[\u0e40-\u0e44]?[\u0e01-\u0e2e][\u0e30-\u0e3a\u0e45\u0e47-\u0e4e]*|."),"isolated"),self.oldtokenizer.backend_tokenizer.pre_tokenizer])
12
+ x=self.model.config.label2id
13
+ y=[k for k in x if k.find("|")<0 and not k.startswith("I-")]
14
+ self.transition=numpy.full((len(x),len(x)),-numpy.inf)
15
+ for k,v in x.items():
16
+ if k.find("|")<0:
17
+ for j in ["I-"+k[2:]] if k.startswith("B-") else [k]+y if k.startswith("I-") else y:
18
+ self.transition[v,x[j]]=0
19
+ def check_model_type(self,supported_models):
20
+ pass
21
+ def postprocess(self,model_outputs,**kwargs):
22
+ if "logits" not in model_outputs:
23
+ return self.postprocess(model_outputs[0],**kwargs)
24
+ return self.bellman_ford_token_classification(model_outputs,**kwargs)
25
+ def bellman_ford_token_classification(self,model_outputs,**kwargs):
26
+ m=model_outputs["logits"][0].numpy()
27
+ e=numpy.exp(m-numpy.max(m,axis=-1,keepdims=True))
28
+ z=e/e.sum(axis=-1,keepdims=True)
29
+ for i in range(m.shape[0]-1,0,-1):
30
+ m[i-1]+=numpy.max(m[i]+self.transition,axis=1)
31
+ k=[numpy.argmax(m[0]+self.transition[0])]
32
+ for i in range(1,m.shape[0]):
33
+ k.append(numpy.argmax(m[i]+self.transition[k[-1]]))
34
+ w=[{"entity":self.model.config.id2label[j],"start":s,"end":e,"score":z[i,j]} for i,((s,e),j) in enumerate(zip(model_outputs["offset_mapping"][0].tolist(),k)) if s<e]
35
+ if "aggregation_strategy" in kwargs and kwargs["aggregation_strategy"]!="none":
36
+ for i,t in reversed(list(enumerate(w))):
37
+ p=t.pop("entity")
38
+ if p.startswith("I-"):
39
+ w[i-1]["score"]=min(w[i-1]["score"],t["score"])
40
+ w[i-1]["end"]=w.pop(i)["end"]
41
+ elif i>0 and w[i-1]["end"]>t["start"]:
42
+ w[i-1]["score"]=min(w[i-1]["score"],t["score"])
43
+ w[i-1]["end"]=w.pop(i)["end"]
44
+ elif p.startswith("B-"):
45
+ t["entity_group"]=p[2:]
46
+ else:
47
+ t["entity_group"]=p
48
+ for t in w:
49
+ t["text"]=model_outputs["sentence"][t["start"]:t["end"]]
50
+ return w
51
+
52
+ class UniversalDependenciesPipeline(BellmanFordTokenClassificationPipeline):
53
+ def __init__(self,**kwargs):
54
+ kwargs["aggregation_strategy"]="simple"
55
+ super().__init__(**kwargs)
56
+ x=self.model.config.label2id
57
+ self.root=numpy.full((len(x)),-numpy.inf)
58
+ self.left_arc=numpy.full((len(x)),-numpy.inf)
59
+ self.right_arc=numpy.full((len(x)),-numpy.inf)
60
+ for k,v in x.items():
61
+ if k.endswith("|root"):
62
+ self.root[v]=0
63
+ elif k.find("|l-")>0:
64
+ self.left_arc[v]=0
65
+ elif k.find("|r-")>0:
66
+ self.right_arc[v]=0
67
+ def postprocess(self,model_outputs,**kwargs):
68
+ import torch
69
+ kwargs["aggregation_strategy"]="simple"
70
+ if "logits" not in model_outputs:
71
+ return self.postprocess(model_outputs[0],**kwargs)
72
+ w=self.bellman_ford_token_classification(model_outputs,**kwargs)
73
+ off=[(t["start"],t["end"]) for t in w]
74
+ for i,(s,e) in reversed(list(enumerate(off))):
75
+ if s<e:
76
+ d=w[i]["text"]
77
+ j=len(d)-len(d.lstrip())
78
+ if j>0:
79
+ d=d.lstrip()
80
+ off[i]=(off[i][0]+j,off[i][1])
81
+ j=len(d)-len(d.rstrip())
82
+ if j>0:
83
+ d=d.rstrip()
84
+ off[i]=(off[i][0],off[i][1]-j)
85
+ if d.strip()=="":
86
+ off.pop(i)
87
+ w.pop(i)
88
+ v=self.oldtokenizer([t["text"] for t in w],add_special_tokens=False)
89
+ x=[not t["entity_group"].endswith(".") for t in w]
90
+ z=model_outputs["input_ids"][0]
91
+ if len(x)<510:
92
+ x=[True]*len(x)
93
+ else:
94
+ k=sum([len(x)-i+1 if b else 0 for i,b in enumerate(x)])+len(z)+1
95
+ for i in numpy.argsort(numpy.array([t["score"] for t in w])):
96
+ if x[i]==False and k+len(x)-i<131072:
97
+ x[i]=True
98
+ k+=len(x)-i+1
99
+ ids=[-1]
100
+ for i in range(len(x)):
101
+ if x[i]:
102
+ ids.append(i)
103
+ for j in range(i+1,len(x)):
104
+ ids.append(j)
105
+ ids.append(-1)
106
+ with torch.no_grad():
107
+ e=self.model.get_input_embeddings().weight
108
+ m=[]
109
+ for j in v["input_ids"]:
110
+ if j==[]:
111
+ j=[self.tokenizer.convert_tokens_to_ids("<|python_tag|>")]
112
+ m.append(e[j,:].sum(axis=0))
113
+ m.append(e[self.tokenizer.eos_token_id,:])
114
+ m=torch.stack(m).to(self.device)
115
+ e=self.model(inputs_embeds=torch.unsqueeze(torch.vstack((e[z,:],m[ids,:])),0))
116
+ m=e.logits[0].cpu().numpy()
117
+ e=numpy.full((len(x),len(x),m.shape[-1]),m.min())
118
+ k=len(z)+1
119
+ for i in range(len(x)):
120
+ if x[i]:
121
+ e[i,i]=m[k]+self.root
122
+ k+=1
123
+ for j in range(1,len(x)-i):
124
+ e[i+j,i]=m[k]+self.left_arc
125
+ e[i,i+j]=m[k]+self.right_arc
126
+ k+=1
127
+ k+=1
128
+ m,p=numpy.max(e,axis=2),numpy.argmax(e,axis=2)
129
+ h=self.chu_liu_edmonds(m)
130
+ z=[i for i,j in enumerate(h) if i==j]
131
+ if len(z)>1:
132
+ k,h=z[numpy.argmax(m[z,z])],numpy.min(m)-numpy.max(m)
133
+ m[:,z]+=[[0 if j in z and (i!=j or i==k) else h for i in z] for j in range(m.shape[0])]
134
+ h=self.chu_liu_edmonds(m)
135
+ q=[self.model.config.id2label[p[j,i]].split("|") for i,j in enumerate(h)]
136
+ t=model_outputs["sentence"].replace("\n"," ")
137
+ u="# text = "+t+"\n"
138
+ for i,(s,e) in enumerate(off):
139
+ u+="\t".join([str(i+1),t[s:e],"_",q[i][0],"_","_" if len(q[i])<3 else "|".join(q[i][1:-1]),str(0 if h[i]==i else h[i]+1),"root" if q[i][-1]=="root" else q[i][-1][2:],"_","_" if i+1<len(off) and e<off[i+1][0] else "SpaceAfter=No"])+"\n"
140
+ return u+"\n"
141
+ def chu_liu_edmonds(self,matrix):
142
+ h=numpy.argmax(matrix,axis=0)
143
+ x=[-1 if i==j else j for i,j in enumerate(h)]
144
+ for b in [lambda x,i,j:-1 if i not in x else x[i],lambda x,i,j:-1 if j<0 else x[j]]:
145
+ y=[]
146
+ while x!=y:
147
+ y=list(x)
148
+ for i,j in enumerate(x):
149
+ x[i]=b(x,i,j)
150
+ if max(x)<0:
151
+ return h
152
+ y,x=[i for i,j in enumerate(x) if j==max(x)],[i for i,j in enumerate(x) if j<max(x)]
153
+ z=matrix-numpy.max(matrix,axis=0)
154
+ m=numpy.block([[z[x,:][:,x],numpy.max(z[x,:][:,y],axis=1).reshape(len(x),1)],[numpy.max(z[y,:][:,x],axis=0),numpy.max(z[y,y])]])
155
+ k=[j if i==len(x) else x[j] if j<len(x) else y[numpy.argmax(z[y,x[i]])] for i,j in enumerate(self.chu_liu_edmonds(m))]
156
+ h=[j if i in y else k[x.index(i)] for i,j in enumerate(h)]
157
+ i=y[numpy.argmax(z[x[k[-1]],y] if k[-1]<len(x) else z[y,y])]
158
+ h[i]=x[k[-1]] if k[-1]<len(x) else i
159
+ return h