KoichiYasuoka commited on
Commit
da14a06
·
1 Parent(s): 20f573a

initial release

Browse files
README.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - "th"
4
+ tags:
5
+ - "thai"
6
+ - "pos"
7
+ - "dependency-parsing"
8
+ - "modernbert"
9
+ base_model: KoichiYasuoka/modernbert-large-thai-wikipedia-upos
10
+ datasets:
11
+ - "universal_dependencies"
12
+ license: "apache-2.0"
13
+ pipeline_tag: "token-classification"
14
+ ---
15
+
16
+ # modernbert-large-thai-wikipedia-ud-embeds
17
+
18
+ ## Model Description
19
+
20
+ This is a ModernBERT model pretrained for POS-tagging and dependency-parsing, derived from [modernbert-large-thai-wikipedia-upos](https://huggingface.co/KoichiYasuoka/modernbert-large-thai-wikipedia-upos).
21
+
22
+ ## How to Use
23
+
24
+ ```py
25
+ from transformers import pipeline
26
+ nlp=pipeline("universal-dependencies","KoichiYasuoka/modernbert-large-thai-wikipedia-ud-embeds",trust_remote_code=True)
27
+ print(nlp("หลายหัวดีกว่าหัวเดียว"))
28
+ ```
29
+
config.json ADDED
@@ -0,0 +1,2156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertForTokenClassification"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration_modernbert.ModernBertConfig",
9
+ "AutoModel": "modeling_modernbert.ModernBertModel",
10
+ "AutoModelForMaskedLM": "modeling_modernbert.ModernBertForMaskedLM",
11
+ "AutoModelForSequenceClassification": "modeling_modernbert.ModernBertForSequenceClassification",
12
+ "AutoModelForTokenClassification": "modeling_modernbert.ModernBertForTokenClassification"
13
+ },
14
+ "bos_token_id": 0,
15
+ "classifier_activation": "gelu",
16
+ "classifier_bias": false,
17
+ "classifier_dropout": 0.0,
18
+ "classifier_pooling": "mean",
19
+ "cls_token_id": 0,
20
+ "custom_pipelines": {
21
+ "upos": {
22
+ "impl": "ud.BellmanFordTokenClassificationPipeline",
23
+ "pt": "AutoModelForTokenClassification"
24
+ },
25
+ "universal-dependencies": {
26
+ "impl": "ud.UniversalDependenciesPipeline",
27
+ "pt": "AutoModelForTokenClassification"
28
+ }
29
+ },
30
+ "decoder_bias": true,
31
+ "deterministic_flash_attn": false,
32
+ "embedding_dropout": 0.0,
33
+ "eos_token_id": 2,
34
+ "global_attn_every_n_layers": 3,
35
+ "global_rope_theta": 160000.0,
36
+ "gradient_checkpointing": false,
37
+ "hidden_activation": "gelu",
38
+ "hidden_size": 1024,
39
+ "id2label": {
40
+ "0": "ADP",
41
+ "1": "ADP.",
42
+ "2": "ADP|Foreign=Yes|_",
43
+ "3": "ADP|Foreign=Yes|l-case",
44
+ "4": "ADP|NounType=Class|_",
45
+ "5": "ADP|NounType=Class|l-case",
46
+ "6": "ADP|Prefix=Yes|_",
47
+ "7": "ADP|Prefix=Yes|l-case",
48
+ "8": "ADP|Prefix=Yes|l-mark",
49
+ "9": "ADP|_",
50
+ "10": "ADP|l-acl",
51
+ "11": "ADP|l-advcl",
52
+ "12": "ADP|l-advmod",
53
+ "13": "ADP|l-case",
54
+ "14": "ADP|l-cc",
55
+ "15": "ADP|l-cc:preconj",
56
+ "16": "ADP|l-csubj",
57
+ "17": "ADP|l-dep",
58
+ "18": "ADP|l-fixed",
59
+ "19": "ADP|l-flat",
60
+ "20": "ADP|l-mark",
61
+ "21": "ADP|l-nmod",
62
+ "22": "ADP|l-nsubj",
63
+ "23": "ADP|l-obl",
64
+ "24": "ADP|l-orphan",
65
+ "25": "ADP|r-acl",
66
+ "26": "ADP|r-advmod",
67
+ "27": "ADP|r-appos",
68
+ "28": "ADP|r-case",
69
+ "29": "ADP|r-compound",
70
+ "30": "ADP|r-conj",
71
+ "31": "ADP|r-fixed",
72
+ "32": "ADP|r-flat",
73
+ "33": "ADP|r-mark",
74
+ "34": "ADP|r-obl",
75
+ "35": "ADP|r-orphan",
76
+ "36": "ADP|root",
77
+ "37": "ADV",
78
+ "38": "ADV.",
79
+ "39": "ADV|Foreign=Yes|_",
80
+ "40": "ADV|Foreign=Yes|l-advmod",
81
+ "41": "ADV|Foreign=Yes|r-advmod",
82
+ "42": "ADV|NumType=Mult|_",
83
+ "43": "ADV|NumType=Mult|r-advmod",
84
+ "44": "ADV|PartType=Adv|_",
85
+ "45": "ADV|PartType=Adv|l-advmod",
86
+ "46": "ADV|PartType=Adv|l-mark",
87
+ "47": "ADV|PartType=Adv|r-advmod",
88
+ "48": "ADV|PartType=Enp|_",
89
+ "49": "ADV|PartType=Enp|l-advmod",
90
+ "50": "ADV|PartType=Enp|r-advmod",
91
+ "51": "ADV|PartType=Int|_",
92
+ "52": "ADV|PartType=Int|r-advmod",
93
+ "53": "ADV|PartType=Int|r-fixed",
94
+ "54": "ADV|Prefix=Yes|_",
95
+ "55": "ADV|Prefix=Yes|l-advmod",
96
+ "56": "ADV|Prefix=Yes|l-mark",
97
+ "57": "ADV|Prefix=Yes|r-advmod",
98
+ "58": "ADV|PronType=Int|_",
99
+ "59": "ADV|PronType=Int|l-advmod",
100
+ "60": "ADV|PronType=Int|r-advmod",
101
+ "61": "ADV|_",
102
+ "62": "ADV|l-acl",
103
+ "63": "ADV|l-advcl",
104
+ "64": "ADV|l-advmod",
105
+ "65": "ADV|l-aux",
106
+ "66": "ADV|l-case",
107
+ "67": "ADV|l-cc",
108
+ "68": "ADV|l-compound",
109
+ "69": "ADV|l-dep",
110
+ "70": "ADV|l-det",
111
+ "71": "ADV|l-discourse",
112
+ "72": "ADV|l-fixed",
113
+ "73": "ADV|l-mark",
114
+ "74": "ADV|l-orphan",
115
+ "75": "ADV|l-xcomp",
116
+ "76": "ADV|r-acl",
117
+ "77": "ADV|r-advcl",
118
+ "78": "ADV|r-advmod",
119
+ "79": "ADV|r-aux",
120
+ "80": "ADV|r-ccomp",
121
+ "81": "ADV|r-compound",
122
+ "82": "ADV|r-conj",
123
+ "83": "ADV|r-det",
124
+ "84": "ADV|r-fixed",
125
+ "85": "ADV|r-flat",
126
+ "86": "ADV|r-mark",
127
+ "87": "ADV|r-nmod",
128
+ "88": "ADV|r-obj",
129
+ "89": "ADV|r-orphan",
130
+ "90": "ADV|r-xcomp",
131
+ "91": "ADV|root",
132
+ "92": "AUX",
133
+ "93": "AUX.",
134
+ "94": "AUX|Foreign=Yes|_",
135
+ "95": "AUX|Foreign=Yes|l-aux",
136
+ "96": "AUX|Mood=Imp|_",
137
+ "97": "AUX|Mood=Imp|l-aux",
138
+ "98": "AUX|NounType=Class|_",
139
+ "99": "AUX|NounType=Class|r-appos",
140
+ "100": "AUX|Prefix=Yes|_",
141
+ "101": "AUX|Prefix=Yes|l-aux",
142
+ "102": "AUX|Prefix=Yes|r-aux",
143
+ "103": "AUX|VerbType=Cop|_",
144
+ "104": "AUX|VerbType=Cop|l-acl",
145
+ "105": "AUX|VerbType=Cop|l-advcl",
146
+ "106": "AUX|VerbType=Cop|l-aux",
147
+ "107": "AUX|VerbType=Cop|l-cop",
148
+ "108": "AUX|VerbType=Cop|r-acl",
149
+ "109": "AUX|VerbType=Cop|r-advcl",
150
+ "110": "AUX|VerbType=Cop|r-aux",
151
+ "111": "AUX|VerbType=Cop|r-conj",
152
+ "112": "AUX|VerbType=Cop|r-mark",
153
+ "113": "AUX|VerbType=Cop|root",
154
+ "114": "AUX|Voice=Pass|_",
155
+ "115": "AUX|Voice=Pass|l-aux",
156
+ "116": "AUX|Voice=Pass|l-aux:pass",
157
+ "117": "AUX|Voice=Pass|r-aux:pass",
158
+ "118": "AUX|_",
159
+ "119": "AUX|l-advmod",
160
+ "120": "AUX|l-aux",
161
+ "121": "AUX|l-aux:pass",
162
+ "122": "AUX|l-cop",
163
+ "123": "AUX|l-mark",
164
+ "124": "AUX|r-acl",
165
+ "125": "AUX|r-advmod",
166
+ "126": "AUX|r-aux",
167
+ "127": "AUX|r-ccomp",
168
+ "128": "AUX|r-clf",
169
+ "129": "AUX|r-compound",
170
+ "130": "AUX|r-conj",
171
+ "131": "AUX|r-fixed",
172
+ "132": "AUX|r-mark",
173
+ "133": "AUX|root",
174
+ "134": "B-ADP",
175
+ "135": "B-ADP.",
176
+ "136": "B-ADV",
177
+ "137": "B-ADV.",
178
+ "138": "B-AUX",
179
+ "139": "B-AUX.",
180
+ "140": "B-CCONJ",
181
+ "141": "B-CCONJ.",
182
+ "142": "B-DET",
183
+ "143": "B-DET.",
184
+ "144": "B-INTJ",
185
+ "145": "B-INTJ.",
186
+ "146": "B-NOUN",
187
+ "147": "B-NOUN.",
188
+ "148": "B-NUM",
189
+ "149": "B-NUM.",
190
+ "150": "B-PART",
191
+ "151": "B-PART.",
192
+ "152": "B-PRON",
193
+ "153": "B-PRON.",
194
+ "154": "B-PROPN",
195
+ "155": "B-PROPN.",
196
+ "156": "B-PUNCT",
197
+ "157": "B-PUNCT.",
198
+ "158": "B-SCONJ",
199
+ "159": "B-SCONJ.",
200
+ "160": "B-SYM",
201
+ "161": "B-SYM.",
202
+ "162": "B-VERB",
203
+ "163": "B-VERB.",
204
+ "164": "CCONJ",
205
+ "165": "CCONJ.",
206
+ "166": "CCONJ|Foreign=Yes|_",
207
+ "167": "CCONJ|Foreign=Yes|l-cc",
208
+ "168": "CCONJ|PronType=Prs|_",
209
+ "169": "CCONJ|PronType=Prs|l-cc",
210
+ "170": "CCONJ|_",
211
+ "171": "CCONJ|l-advmod",
212
+ "172": "CCONJ|l-case",
213
+ "173": "CCONJ|l-cc",
214
+ "174": "CCONJ|l-conj",
215
+ "175": "CCONJ|l-discourse",
216
+ "176": "CCONJ|l-fixed",
217
+ "177": "CCONJ|l-flat",
218
+ "178": "CCONJ|l-mark",
219
+ "179": "CCONJ|l-nsubj",
220
+ "180": "CCONJ|l-obj",
221
+ "181": "CCONJ|l-orphan",
222
+ "182": "CCONJ|r-cc",
223
+ "183": "CCONJ|r-compound",
224
+ "184": "CCONJ|r-conj",
225
+ "185": "CCONJ|r-fixed",
226
+ "186": "CCONJ|r-mark",
227
+ "187": "CCONJ|r-obl",
228
+ "188": "CCONJ|root",
229
+ "189": "DET",
230
+ "190": "DET.",
231
+ "191": "DET|NumType=Mult|_",
232
+ "192": "DET|NumType=Mult|l-det",
233
+ "193": "DET|PartType=Emp|_",
234
+ "194": "DET|PartType=Emp|r-det",
235
+ "195": "DET|PartType=Int|_",
236
+ "196": "DET|PartType=Int|r-det",
237
+ "197": "DET|PronType=Int|_",
238
+ "198": "DET|PronType=Int|r-det",
239
+ "199": "DET|_",
240
+ "200": "DET|l-advmod",
241
+ "201": "DET|l-case",
242
+ "202": "DET|l-cc:preconj",
243
+ "203": "DET|l-compound",
244
+ "204": "DET|l-det",
245
+ "205": "DET|l-det:predet",
246
+ "206": "DET|l-discourse",
247
+ "207": "DET|l-mark",
248
+ "208": "DET|l-nsubj",
249
+ "209": "DET|l-nsubj:pass",
250
+ "210": "DET|l-obj",
251
+ "211": "DET|l-obl",
252
+ "212": "DET|l-obl:tmod",
253
+ "213": "DET|l-orphan",
254
+ "214": "DET|r-advmod",
255
+ "215": "DET|r-compound",
256
+ "216": "DET|r-conj",
257
+ "217": "DET|r-dep",
258
+ "218": "DET|r-det",
259
+ "219": "DET|r-fixed",
260
+ "220": "DET|r-flat",
261
+ "221": "DET|r-list",
262
+ "222": "DET|r-nmod",
263
+ "223": "DET|r-nummod",
264
+ "224": "DET|r-obj",
265
+ "225": "DET|r-obl",
266
+ "226": "DET|r-orphan",
267
+ "227": "DET|root",
268
+ "228": "I-ADP",
269
+ "229": "I-ADP.",
270
+ "230": "I-ADV",
271
+ "231": "I-ADV.",
272
+ "232": "I-AUX",
273
+ "233": "I-AUX.",
274
+ "234": "I-CCONJ",
275
+ "235": "I-CCONJ.",
276
+ "236": "I-DET",
277
+ "237": "I-DET.",
278
+ "238": "I-INTJ",
279
+ "239": "I-INTJ.",
280
+ "240": "I-NOUN",
281
+ "241": "I-NOUN.",
282
+ "242": "I-NUM",
283
+ "243": "I-NUM.",
284
+ "244": "I-PART",
285
+ "245": "I-PART.",
286
+ "246": "I-PRON",
287
+ "247": "I-PRON.",
288
+ "248": "I-PROPN",
289
+ "249": "I-PROPN.",
290
+ "250": "I-PUNCT",
291
+ "251": "I-PUNCT.",
292
+ "252": "I-SCONJ",
293
+ "253": "I-SCONJ.",
294
+ "254": "I-SYM",
295
+ "255": "I-SYM.",
296
+ "256": "I-VERB",
297
+ "257": "I-VERB.",
298
+ "258": "INTJ",
299
+ "259": "INTJ.",
300
+ "260": "INTJ|_",
301
+ "261": "INTJ|l-nsubj",
302
+ "262": "INTJ|r-acl",
303
+ "263": "INTJ|root",
304
+ "264": "NOUN",
305
+ "265": "NOUN.",
306
+ "266": "NOUN|Abbr=Yes|Foreign=Yes|_",
307
+ "267": "NOUN|Abbr=Yes|Foreign=Yes|r-nmod",
308
+ "268": "NOUN|Abbr=Yes|Prefix=Yes|_",
309
+ "269": "NOUN|Abbr=Yes|Prefix=Yes|l-flat",
310
+ "270": "NOUN|Abbr=Yes|_",
311
+ "271": "NOUN|Abbr=Yes|l-flat",
312
+ "272": "NOUN|Abbr=Yes|l-nmod",
313
+ "273": "NOUN|Abbr=Yes|l-nsubj",
314
+ "274": "NOUN|Abbr=Yes|l-obl",
315
+ "275": "NOUN|Abbr=Yes|r-acl",
316
+ "276": "NOUN|Abbr=Yes|r-appos",
317
+ "277": "NOUN|Abbr=Yes|r-clf",
318
+ "278": "NOUN|Abbr=Yes|r-conj",
319
+ "279": "NOUN|Abbr=Yes|r-fixed",
320
+ "280": "NOUN|Abbr=Yes|r-flat",
321
+ "281": "NOUN|Abbr=Yes|r-nmod",
322
+ "282": "NOUN|Abbr=Yes|r-obj",
323
+ "283": "NOUN|Abbr=Yes|r-obl",
324
+ "284": "NOUN|Foreign=Yes|NounType=Class|_",
325
+ "285": "NOUN|Foreign=Yes|NounType=Class|r-clf",
326
+ "286": "NOUN|Foreign=Yes|NounType=Class|r-obj",
327
+ "287": "NOUN|Foreign=Yes|Prefix=Yes|_",
328
+ "288": "NOUN|Foreign=Yes|Prefix=Yes|l-flat",
329
+ "289": "NOUN|Foreign=Yes|Prefix=Yes|r-appos",
330
+ "290": "NOUN|Foreign=Yes|_",
331
+ "291": "NOUN|Foreign=Yes|l-dislocated",
332
+ "292": "NOUN|Foreign=Yes|l-flat",
333
+ "293": "NOUN|Foreign=Yes|l-nmod",
334
+ "294": "NOUN|Foreign=Yes|l-nsubj",
335
+ "295": "NOUN|Foreign=Yes|l-obl",
336
+ "296": "NOUN|Foreign=Yes|r-acl",
337
+ "297": "NOUN|Foreign=Yes|r-advcl",
338
+ "298": "NOUN|Foreign=Yes|r-advmod",
339
+ "299": "NOUN|Foreign=Yes|r-appos",
340
+ "300": "NOUN|Foreign=Yes|r-ccomp",
341
+ "301": "NOUN|Foreign=Yes|r-clf",
342
+ "302": "NOUN|Foreign=Yes|r-compound",
343
+ "303": "NOUN|Foreign=Yes|r-conj",
344
+ "304": "NOUN|Foreign=Yes|r-flat",
345
+ "305": "NOUN|Foreign=Yes|r-iobj",
346
+ "306": "NOUN|Foreign=Yes|r-list",
347
+ "307": "NOUN|Foreign=Yes|r-nmod",
348
+ "308": "NOUN|Foreign=Yes|r-obj",
349
+ "309": "NOUN|Foreign=Yes|r-obl",
350
+ "310": "NOUN|Foreign=Yes|r-xcomp",
351
+ "311": "NOUN|Foreign=Yes|root",
352
+ "312": "NOUN|NameType=Com|_",
353
+ "313": "NOUN|NameType=Com|r-nmod",
354
+ "314": "NOUN|NameType=Geo|_",
355
+ "315": "NOUN|NameType=Geo|l-nsubj",
356
+ "316": "NOUN|NameType=Geo|r-nmod",
357
+ "317": "NOUN|NameType=Geo|r-obj",
358
+ "318": "NOUN|NameType=Nat|_",
359
+ "319": "NOUN|NameType=Nat|r-nmod",
360
+ "320": "NOUN|NameType=Oth|_",
361
+ "321": "NOUN|NameType=Oth|l-nsubj",
362
+ "322": "NOUN|NameType=Oth|r-conj",
363
+ "323": "NOUN|NameType=Oth|r-flat",
364
+ "324": "NOUN|NameType=Oth|r-nmod",
365
+ "325": "NOUN|NameType=Pro|_",
366
+ "326": "NOUN|NameType=Pro|r-nmod",
367
+ "327": "NOUN|NameType=Prs|_",
368
+ "328": "NOUN|NameType=Prs|l-nsubj",
369
+ "329": "NOUN|NameType=Prs|r-nmod",
370
+ "330": "NOUN|NounType=Class|Prefix=Yes|_",
371
+ "331": "NOUN|NounType=Class|Prefix=Yes|l-advcl",
372
+ "332": "NOUN|NounType=Class|Prefix=Yes|l-advmod",
373
+ "333": "NOUN|NounType=Class|Prefix=Yes|l-mark",
374
+ "334": "NOUN|NounType=Class|Prefix=Yes|l-nmod",
375
+ "335": "NOUN|NounType=Class|Prefix=Yes|l-nsubj",
376
+ "336": "NOUN|NounType=Class|Prefix=Yes|r-advcl",
377
+ "337": "NOUN|NounType=Class|Prefix=Yes|r-clf",
378
+ "338": "NOUN|NounType=Class|Prefix=Yes|r-nmod",
379
+ "339": "NOUN|NounType=Class|Prefix=Yes|r-obj",
380
+ "340": "NOUN|NounType=Class|_",
381
+ "341": "NOUN|NounType=Class|l-advcl",
382
+ "342": "NOUN|NounType=Class|l-advmod",
383
+ "343": "NOUN|NounType=Class|l-clf",
384
+ "344": "NOUN|NounType=Class|l-dislocated",
385
+ "345": "NOUN|NounType=Class|l-nmod",
386
+ "346": "NOUN|NounType=Class|l-nsubj",
387
+ "347": "NOUN|NounType=Class|l-obj",
388
+ "348": "NOUN|NounType=Class|l-obl",
389
+ "349": "NOUN|NounType=Class|r-acl",
390
+ "350": "NOUN|NounType=Class|r-advcl",
391
+ "351": "NOUN|NounType=Class|r-advmod",
392
+ "352": "NOUN|NounType=Class|r-appos",
393
+ "353": "NOUN|NounType=Class|r-cc",
394
+ "354": "NOUN|NounType=Class|r-ccomp",
395
+ "355": "NOUN|NounType=Class|r-clf",
396
+ "356": "NOUN|NounType=Class|r-compound",
397
+ "357": "NOUN|NounType=Class|r-conj",
398
+ "358": "NOUN|NounType=Class|r-dislocated",
399
+ "359": "NOUN|NounType=Class|r-fixed",
400
+ "360": "NOUN|NounType=Class|r-flat",
401
+ "361": "NOUN|NounType=Class|r-iobj",
402
+ "362": "NOUN|NounType=Class|r-list",
403
+ "363": "NOUN|NounType=Class|r-nmod",
404
+ "364": "NOUN|NounType=Class|r-nummod",
405
+ "365": "NOUN|NounType=Class|r-obj",
406
+ "366": "NOUN|NounType=Class|r-obl",
407
+ "367": "NOUN|NounType=Class|r-orphan",
408
+ "368": "NOUN|NounType=Class|r-xcomp",
409
+ "369": "NOUN|NounType=Class|root",
410
+ "370": "NOUN|NumType=Mult|_",
411
+ "371": "NOUN|NumType=Mult|r-advcl",
412
+ "372": "NOUN|NumType=Mult|r-nmod",
413
+ "373": "NOUN|NumType=Mult|r-obj",
414
+ "374": "NOUN|PartType=Enp|_",
415
+ "375": "NOUN|PartType=Enp|r-obj",
416
+ "376": "NOUN|PartType=Enp|r-obl",
417
+ "377": "NOUN|PartType=Int|_",
418
+ "378": "NOUN|PartType=Int|r-obj",
419
+ "379": "NOUN|PartType=Res|_",
420
+ "380": "NOUN|PartType=Res|r-nmod",
421
+ "381": "NOUN|PartType=Res|r-obj",
422
+ "382": "NOUN|Prefix=Yes|_",
423
+ "383": "NOUN|Prefix=Yes|l-acl",
424
+ "384": "NOUN|Prefix=Yes|l-advcl",
425
+ "385": "NOUN|Prefix=Yes|l-clf",
426
+ "386": "NOUN|Prefix=Yes|l-csubj",
427
+ "387": "NOUN|Prefix=Yes|l-dislocated",
428
+ "388": "NOUN|Prefix=Yes|l-flat",
429
+ "389": "NOUN|Prefix=Yes|l-nmod",
430
+ "390": "NOUN|Prefix=Yes|l-nsubj",
431
+ "391": "NOUN|Prefix=Yes|l-obj",
432
+ "392": "NOUN|Prefix=Yes|l-obl",
433
+ "393": "NOUN|Prefix=Yes|r-acl",
434
+ "394": "NOUN|Prefix=Yes|r-advcl",
435
+ "395": "NOUN|Prefix=Yes|r-advmod",
436
+ "396": "NOUN|Prefix=Yes|r-appos",
437
+ "397": "NOUN|Prefix=Yes|r-case",
438
+ "398": "NOUN|Prefix=Yes|r-cc",
439
+ "399": "NOUN|Prefix=Yes|r-ccomp",
440
+ "400": "NOUN|Prefix=Yes|r-clf",
441
+ "401": "NOUN|Prefix=Yes|r-compound",
442
+ "402": "NOUN|Prefix=Yes|r-conj",
443
+ "403": "NOUN|Prefix=Yes|r-dislocated",
444
+ "404": "NOUN|Prefix=Yes|r-fixed",
445
+ "405": "NOUN|Prefix=Yes|r-flat",
446
+ "406": "NOUN|Prefix=Yes|r-iobj",
447
+ "407": "NOUN|Prefix=Yes|r-list",
448
+ "408": "NOUN|Prefix=Yes|r-nmod",
449
+ "409": "NOUN|Prefix=Yes|r-nummod",
450
+ "410": "NOUN|Prefix=Yes|r-obj",
451
+ "411": "NOUN|Prefix=Yes|r-obl",
452
+ "412": "NOUN|Prefix=Yes|r-orphan",
453
+ "413": "NOUN|Prefix=Yes|r-xcomp",
454
+ "414": "NOUN|Prefix=Yes|root",
455
+ "415": "NOUN|_",
456
+ "416": "NOUN|l-acl",
457
+ "417": "NOUN|l-advcl",
458
+ "418": "NOUN|l-advmod",
459
+ "419": "NOUN|l-aux",
460
+ "420": "NOUN|l-case",
461
+ "421": "NOUN|l-cc",
462
+ "422": "NOUN|l-ccomp",
463
+ "423": "NOUN|l-compound",
464
+ "424": "NOUN|l-csubj",
465
+ "425": "NOUN|l-discourse",
466
+ "426": "NOUN|l-dislocated",
467
+ "427": "NOUN|l-expl",
468
+ "428": "NOUN|l-flat",
469
+ "429": "NOUN|l-iobj",
470
+ "430": "NOUN|l-mark",
471
+ "431": "NOUN|l-nmod",
472
+ "432": "NOUN|l-nsubj",
473
+ "433": "NOUN|l-nsubj:pass",
474
+ "434": "NOUN|l-nummod",
475
+ "435": "NOUN|l-obj",
476
+ "436": "NOUN|l-obl",
477
+ "437": "NOUN|l-obl:tmod",
478
+ "438": "NOUN|l-orphan",
479
+ "439": "NOUN|l-vocative",
480
+ "440": "NOUN|r-acl",
481
+ "441": "NOUN|r-acl:relcl",
482
+ "442": "NOUN|r-advcl",
483
+ "443": "NOUN|r-advmod",
484
+ "444": "NOUN|r-appos",
485
+ "445": "NOUN|r-case",
486
+ "446": "NOUN|r-cc",
487
+ "447": "NOUN|r-ccomp",
488
+ "448": "NOUN|r-clf",
489
+ "449": "NOUN|r-compound",
490
+ "450": "NOUN|r-conj",
491
+ "451": "NOUN|r-cop",
492
+ "452": "NOUN|r-discourse",
493
+ "453": "NOUN|r-dislocated",
494
+ "454": "NOUN|r-fixed",
495
+ "455": "NOUN|r-flat",
496
+ "456": "NOUN|r-flat:name",
497
+ "457": "NOUN|r-iobj",
498
+ "458": "NOUN|r-list",
499
+ "459": "NOUN|r-mark",
500
+ "460": "NOUN|r-nmod",
501
+ "461": "NOUN|r-nmod:poss",
502
+ "462": "NOUN|r-nsubj",
503
+ "463": "NOUN|r-nummod",
504
+ "464": "NOUN|r-obj",
505
+ "465": "NOUN|r-obl",
506
+ "466": "NOUN|r-obl:poss",
507
+ "467": "NOUN|r-obl:tmod",
508
+ "468": "NOUN|r-orphan",
509
+ "469": "NOUN|r-parataxis",
510
+ "470": "NOUN|r-xcomp",
511
+ "471": "NOUN|root",
512
+ "472": "NUM",
513
+ "473": "NUM.",
514
+ "474": "NUM|Abbr=Yes|_",
515
+ "475": "NUM|Abbr=Yes|r-flat",
516
+ "476": "NUM|Abbr=Yes|r-nummod",
517
+ "477": "NUM|Abbr=Yes|r-obj",
518
+ "478": "NUM|Foreign=Yes|_",
519
+ "479": "NUM|Foreign=Yes|r-clf",
520
+ "480": "NUM|NumType=Mult|_",
521
+ "481": "NUM|NumType=Mult|l-advmod",
522
+ "482": "NUM|NumType=Mult|l-nummod",
523
+ "483": "NUM|NumType=Mult|r-advmod",
524
+ "484": "NUM|Prefix=Yes|_",
525
+ "485": "NUM|Prefix=Yes|l-nummod",
526
+ "486": "NUM|_",
527
+ "487": "NUM|l-advcl",
528
+ "488": "NUM|l-advmod",
529
+ "489": "NUM|l-case",
530
+ "490": "NUM|l-clf",
531
+ "491": "NUM|l-dep",
532
+ "492": "NUM|l-flat",
533
+ "493": "NUM|l-nmod",
534
+ "494": "NUM|l-nsubj",
535
+ "495": "NUM|l-nummod",
536
+ "496": "NUM|l-obl",
537
+ "497": "NUM|l-obl:tmod",
538
+ "498": "NUM|r-acl",
539
+ "499": "NUM|r-acl:relcl",
540
+ "500": "NUM|r-advmod",
541
+ "501": "NUM|r-appos",
542
+ "502": "NUM|r-ccomp",
543
+ "503": "NUM|r-clf",
544
+ "504": "NUM|r-compound",
545
+ "505": "NUM|r-conj",
546
+ "506": "NUM|r-det",
547
+ "507": "NUM|r-fixed",
548
+ "508": "NUM|r-flat",
549
+ "509": "NUM|r-flat:name",
550
+ "510": "NUM|r-iobj",
551
+ "511": "NUM|r-nmod",
552
+ "512": "NUM|r-nummod",
553
+ "513": "NUM|r-obj",
554
+ "514": "NUM|r-obl",
555
+ "515": "NUM|r-obl:poss",
556
+ "516": "NUM|r-obl:tmod",
557
+ "517": "NUM|r-xcomp",
558
+ "518": "NUM|root",
559
+ "519": "PART",
560
+ "520": "PART.",
561
+ "521": "PART|Aspect=Perf|_",
562
+ "522": "PART|Aspect=Perf|l-aux",
563
+ "523": "PART|Aspect=Perf|r-aux",
564
+ "524": "PART|Aspect=Perf|r-xcomp",
565
+ "525": "PART|Aspect=Prog|_",
566
+ "526": "PART|Aspect=Prog|l-aux",
567
+ "527": "PART|Aspect=Prog|r-aux",
568
+ "528": "PART|NameType=Oth|_",
569
+ "529": "PART|NameType=Oth|l-advmod",
570
+ "530": "PART|NounType=Class|PartType=Emp|Prefix=Yes|_",
571
+ "531": "PART|NounType=Class|PartType=Emp|Prefix=Yes|l-mark",
572
+ "532": "PART|NounType=Class|PartType=Emp|_",
573
+ "533": "PART|NounType=Class|PartType=Emp|l-mark",
574
+ "534": "PART|NounType=Class|Prefix=Yes|_",
575
+ "535": "PART|NounType=Class|Prefix=Yes|l-mark",
576
+ "536": "PART|NumType=Mult|PartType=Emp|_",
577
+ "537": "PART|NumType=Mult|PartType=Emp|l-mark",
578
+ "538": "PART|PartType=Adj|_",
579
+ "539": "PART|PartType=Adj|l-mark",
580
+ "540": "PART|PartType=Adj|l-orphan",
581
+ "541": "PART|PartType=Adj|r-acl",
582
+ "542": "PART|PartType=Adj|r-compound",
583
+ "543": "PART|PartType=Adj|r-nmod",
584
+ "544": "PART|PartType=Adv|_",
585
+ "545": "PART|PartType=Adv|l-advmod",
586
+ "546": "PART|PartType=Adv|l-mark",
587
+ "547": "PART|PartType=Adv|r-advmod",
588
+ "548": "PART|PartType=Emp|Prefix=Yes|_",
589
+ "549": "PART|PartType=Emp|Prefix=Yes|l-advmod",
590
+ "550": "PART|PartType=Emp|Prefix=Yes|l-aux",
591
+ "551": "PART|PartType=Emp|Prefix=Yes|l-mark",
592
+ "552": "PART|PartType=Emp|_",
593
+ "553": "PART|PartType=Emp|l-advmod",
594
+ "554": "PART|PartType=Emp|l-case",
595
+ "555": "PART|PartType=Emp|l-discourse",
596
+ "556": "PART|PartType=Emp|l-mark",
597
+ "557": "PART|PartType=Emp|r-acl",
598
+ "558": "PART|PartType=Emp|r-advmod",
599
+ "559": "PART|PartType=Emp|r-aux",
600
+ "560": "PART|PartType=Emp|r-compound",
601
+ "561": "PART|PartType=Emp|r-det",
602
+ "562": "PART|PartType=Emp|r-fixed",
603
+ "563": "PART|PartType=Emp|r-mark",
604
+ "564": "PART|PartType=Emp|r-nmod",
605
+ "565": "PART|PartType=Enp|_",
606
+ "566": "PART|PartType=Enp|l-discourse",
607
+ "567": "PART|PartType=Enp|r-acl",
608
+ "568": "PART|PartType=Enp|r-advmod",
609
+ "569": "PART|PartType=Enp|r-compound",
610
+ "570": "PART|PartType=Enp|r-dep",
611
+ "571": "PART|PartType=Enp|r-det",
612
+ "572": "PART|PartType=Enp|r-discourse",
613
+ "573": "PART|PartType=Enp|r-fixed",
614
+ "574": "PART|PartType=Enp|r-obl",
615
+ "575": "PART|PartType=Int|_",
616
+ "576": "PART|PartType=Int|l-advmod",
617
+ "577": "PART|PartType=Int|l-mark",
618
+ "578": "PART|PartType=Int|r-acl",
619
+ "579": "PART|PartType=Int|r-advmod",
620
+ "580": "PART|PartType=Int|r-dep",
621
+ "581": "PART|PartType=Int|r-discourse",
622
+ "582": "PART|PartType=Int|r-nmod",
623
+ "583": "PART|PartType=Int|r-obj",
624
+ "584": "PART|PartType=Int|r-obl",
625
+ "585": "PART|PartType=Neg|_",
626
+ "586": "PART|PartType=Neg|l-advcl",
627
+ "587": "PART|PartType=Neg|l-advmod",
628
+ "588": "PART|PartType=Neg|l-aux",
629
+ "589": "PART|PartType=Neg|l-mark",
630
+ "590": "PART|PartType=Neg|r-acl",
631
+ "591": "PART|PartType=Neg|r-advmod",
632
+ "592": "PART|PartType=Neg|r-fixed",
633
+ "593": "PART|PartType=Res|_",
634
+ "594": "PART|PartType=Res|r-advmod",
635
+ "595": "PART|PartType=Res|r-discourse",
636
+ "596": "PART|PartType=Res|r-fixed",
637
+ "597": "PART|Polarity=Neg|_",
638
+ "598": "PART|Polarity=Neg|l-advmod",
639
+ "599": "PART|Prefix=Yes|_",
640
+ "600": "PART|Prefix=Yes|l-advmod",
641
+ "601": "PART|Prefix=Yes|l-aux",
642
+ "602": "PART|Prefix=Yes|l-mark",
643
+ "603": "PART|Prefix=Yes|r-acl",
644
+ "604": "PART|Prefix=Yes|r-nmod",
645
+ "605": "PART|PronType=Int|_",
646
+ "606": "PART|PronType=Int|r-acl",
647
+ "607": "PART|PronType=Int|r-advmod",
648
+ "608": "PART|PronType=Int|r-discourse",
649
+ "609": "PART|PronType=Int|r-obj",
650
+ "610": "PART|PronType=Int|root",
651
+ "611": "PART|_",
652
+ "612": "PART|l-advmod",
653
+ "613": "PART|l-cc",
654
+ "614": "PART|l-cc:preconj",
655
+ "615": "PART|l-discourse",
656
+ "616": "PART|l-mark",
657
+ "617": "PART|l-nsubj",
658
+ "618": "PART|r-acl",
659
+ "619": "PART|r-advmod",
660
+ "620": "PART|r-aux",
661
+ "621": "PART|r-ccomp",
662
+ "622": "PART|r-clf",
663
+ "623": "PART|r-compound",
664
+ "624": "PART|r-compound:prt",
665
+ "625": "PART|r-conj",
666
+ "626": "PART|r-discourse",
667
+ "627": "PART|r-fixed",
668
+ "628": "PART|r-mark",
669
+ "629": "PART|r-nmod",
670
+ "630": "PART|r-nmod:poss",
671
+ "631": "PART|r-obj",
672
+ "632": "PART|r-obl",
673
+ "633": "PART|root",
674
+ "634": "PRON",
675
+ "635": "PRON.",
676
+ "636": "PRON|NounType=Class|_",
677
+ "637": "PRON|NounType=Class|r-clf",
678
+ "638": "PRON|Person=1|_",
679
+ "639": "PRON|Person=1|l-nsubj",
680
+ "640": "PRON|Person=1|l-nsubj:pass",
681
+ "641": "PRON|Person=1|r-compound",
682
+ "642": "PRON|Person=1|r-nmod:poss",
683
+ "643": "PRON|Person=1|r-obj",
684
+ "644": "PRON|Person=1|r-obl",
685
+ "645": "PRON|Person=1|r-obl:poss",
686
+ "646": "PRON|Person=2|_",
687
+ "647": "PRON|Person=2|l-nsubj",
688
+ "648": "PRON|Person=2|r-compound",
689
+ "649": "PRON|Person=2|r-nmod:poss",
690
+ "650": "PRON|Person=2|r-obj",
691
+ "651": "PRON|Person=2|r-obl",
692
+ "652": "PRON|Person=3|_",
693
+ "653": "PRON|Person=3|l-advmod",
694
+ "654": "PRON|Person=3|l-nsubj",
695
+ "655": "PRON|Person=3|l-nsubj:pass",
696
+ "656": "PRON|Person=3|l-reparandum",
697
+ "657": "PRON|Person=3|r-appos",
698
+ "658": "PRON|Person=3|r-compound",
699
+ "659": "PRON|Person=3|r-conj",
700
+ "660": "PRON|Person=3|r-nmod",
701
+ "661": "PRON|Person=3|r-nmod:poss",
702
+ "662": "PRON|Person=3|r-obj",
703
+ "663": "PRON|Person=3|r-obl",
704
+ "664": "PRON|Person=3|r-obl:poss",
705
+ "665": "PRON|Person=3|r-xcomp",
706
+ "666": "PRON|PronType=Int|_",
707
+ "667": "PRON|PronType=Int|l-nsubj",
708
+ "668": "PRON|PronType=Int|r-obj",
709
+ "669": "PRON|PronType=Int|r-obl",
710
+ "670": "PRON|PronType=Int|root",
711
+ "671": "PRON|PronType=Prs|_",
712
+ "672": "PRON|PronType=Prs|l-advmod",
713
+ "673": "PRON|PronType=Prs|l-expl",
714
+ "674": "PRON|PronType=Prs|l-nsubj",
715
+ "675": "PRON|PronType=Prs|l-obj",
716
+ "676": "PRON|PronType=Prs|l-obl",
717
+ "677": "PRON|PronType=Prs|r-advcl",
718
+ "678": "PRON|PronType=Prs|r-advmod",
719
+ "679": "PRON|PronType=Prs|r-ccomp",
720
+ "680": "PRON|PronType=Prs|r-clf",
721
+ "681": "PRON|PronType=Prs|r-conj",
722
+ "682": "PRON|PronType=Prs|r-nmod",
723
+ "683": "PRON|PronType=Prs|r-nsubj",
724
+ "684": "PRON|PronType=Prs|r-obj",
725
+ "685": "PRON|PronType=Prs|r-obl",
726
+ "686": "PRON|PronType=Prs|root",
727
+ "687": "PRON|PronType=Rcp|_",
728
+ "688": "PRON|PronType=Rcp|r-advmod",
729
+ "689": "PRON|PronType=Rcp|r-iobj",
730
+ "690": "PRON|PronType=Rcp|r-nmod",
731
+ "691": "PRON|PronType=Rcp|r-obj",
732
+ "692": "PRON|PronType=Rcp|r-obl",
733
+ "693": "PRON|_",
734
+ "694": "PRON|l-advcl",
735
+ "695": "PRON|l-advmod",
736
+ "696": "PRON|l-compound",
737
+ "697": "PRON|l-csubj",
738
+ "698": "PRON|l-dislocated",
739
+ "699": "PRON|l-expl",
740
+ "700": "PRON|l-iobj",
741
+ "701": "PRON|l-mark",
742
+ "702": "PRON|l-nmod",
743
+ "703": "PRON|l-nsubj",
744
+ "704": "PRON|l-obj",
745
+ "705": "PRON|l-obl",
746
+ "706": "PRON|r-acl",
747
+ "707": "PRON|r-acl:relcl",
748
+ "708": "PRON|r-advcl",
749
+ "709": "PRON|r-advmod",
750
+ "710": "PRON|r-appos",
751
+ "711": "PRON|r-ccomp",
752
+ "712": "PRON|r-compound",
753
+ "713": "PRON|r-conj",
754
+ "714": "PRON|r-det",
755
+ "715": "PRON|r-discourse",
756
+ "716": "PRON|r-fixed",
757
+ "717": "PRON|r-flat",
758
+ "718": "PRON|r-iobj",
759
+ "719": "PRON|r-nmod",
760
+ "720": "PRON|r-nmod:poss",
761
+ "721": "PRON|r-nsubj",
762
+ "722": "PRON|r-obj",
763
+ "723": "PRON|r-obl",
764
+ "724": "PRON|r-obl:poss",
765
+ "725": "PRON|r-xcomp",
766
+ "726": "PRON|root",
767
+ "727": "PROPN",
768
+ "728": "PROPN.",
769
+ "729": "PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth|_",
770
+ "730": "PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth|r-obj",
771
+ "731": "PROPN|Abbr=Yes|NameType=Com|_",
772
+ "732": "PROPN|Abbr=Yes|NameType=Com|r-advmod",
773
+ "733": "PROPN|Abbr=Yes|NameType=Com|r-nmod",
774
+ "734": "PROPN|Abbr=Yes|_",
775
+ "735": "PROPN|Abbr=Yes|l-nmod",
776
+ "736": "PROPN|Abbr=Yes|l-nsubj",
777
+ "737": "PROPN|Abbr=Yes|r-nmod",
778
+ "738": "PROPN|Foreign=Yes|NameType=Com|_",
779
+ "739": "PROPN|Foreign=Yes|NameType=Com|l-nsubj",
780
+ "740": "PROPN|Foreign=Yes|NameType=Com|r-list",
781
+ "741": "PROPN|Foreign=Yes|NameType=Com|r-nmod",
782
+ "742": "PROPN|Foreign=Yes|NameType=Com|r-obl",
783
+ "743": "PROPN|Foreign=Yes|NameType=Geo|_",
784
+ "744": "PROPN|Foreign=Yes|NameType=Geo|r-obj",
785
+ "745": "PROPN|Foreign=Yes|NameType=Geo|r-obl",
786
+ "746": "PROPN|Foreign=Yes|NameType=Giv|_",
787
+ "747": "PROPN|Foreign=Yes|NameType=Giv|l-nsubj",
788
+ "748": "PROPN|Foreign=Yes|NameType=Oth|_",
789
+ "749": "PROPN|Foreign=Yes|NameType=Oth|r-conj",
790
+ "750": "PROPN|Foreign=Yes|NameType=Oth|r-flat",
791
+ "751": "PROPN|Foreign=Yes|NameType=Oth|r-nmod",
792
+ "752": "PROPN|Foreign=Yes|NameType=Prs|_",
793
+ "753": "PROPN|Foreign=Yes|NameType=Prs|l-flat",
794
+ "754": "PROPN|Foreign=Yes|NameType=Prs|l-nsubj",
795
+ "755": "PROPN|Foreign=Yes|NameType=Prs|r-conj",
796
+ "756": "PROPN|Foreign=Yes|NameType=Prs|r-flat",
797
+ "757": "PROPN|Foreign=Yes|NameType=Prs|r-nmod",
798
+ "758": "PROPN|Foreign=Yes|NameType=Prs|r-obj",
799
+ "759": "PROPN|Foreign=Yes|NameType=Prs|r-obl",
800
+ "760": "PROPN|Foreign=Yes|NameType=Sur|_",
801
+ "761": "PROPN|Foreign=Yes|NameType=Sur|r-flat",
802
+ "762": "PROPN|Foreign=Yes|_",
803
+ "763": "PROPN|Foreign=Yes|l-flat",
804
+ "764": "PROPN|Foreign=Yes|l-nmod",
805
+ "765": "PROPN|Foreign=Yes|l-nsubj",
806
+ "766": "PROPN|Foreign=Yes|l-obl",
807
+ "767": "PROPN|Foreign=Yes|r-appos",
808
+ "768": "PROPN|Foreign=Yes|r-ccomp",
809
+ "769": "PROPN|Foreign=Yes|r-compound",
810
+ "770": "PROPN|Foreign=Yes|r-conj",
811
+ "771": "PROPN|Foreign=Yes|r-flat",
812
+ "772": "PROPN|Foreign=Yes|r-iobj",
813
+ "773": "PROPN|Foreign=Yes|r-list",
814
+ "774": "PROPN|Foreign=Yes|r-nmod",
815
+ "775": "PROPN|Foreign=Yes|r-nsubj",
816
+ "776": "PROPN|Foreign=Yes|r-obj",
817
+ "777": "PROPN|Foreign=Yes|r-obl",
818
+ "778": "PROPN|Foreign=Yes|root",
819
+ "779": "PROPN|NameType=Com|_",
820
+ "780": "PROPN|NameType=Com|l-nsubj",
821
+ "781": "PROPN|NameType=Com|l-obl",
822
+ "782": "PROPN|NameType=Com|r-appos",
823
+ "783": "PROPN|NameType=Com|r-conj",
824
+ "784": "PROPN|NameType=Com|r-flat",
825
+ "785": "PROPN|NameType=Com|r-list",
826
+ "786": "PROPN|NameType=Com|r-nmod",
827
+ "787": "PROPN|NameType=Com|r-nsubj",
828
+ "788": "PROPN|NameType=Com|r-obj",
829
+ "789": "PROPN|NameType=Com|r-obl",
830
+ "790": "PROPN|NameType=Geo|_",
831
+ "791": "PROPN|NameType=Geo|l-nsubj",
832
+ "792": "PROPN|NameType=Geo|l-obl",
833
+ "793": "PROPN|NameType=Geo|r-compound",
834
+ "794": "PROPN|NameType=Geo|r-conj",
835
+ "795": "PROPN|NameType=Geo|r-flat",
836
+ "796": "PROPN|NameType=Geo|r-list",
837
+ "797": "PROPN|NameType=Geo|r-nmod",
838
+ "798": "PROPN|NameType=Geo|r-nsubj",
839
+ "799": "PROPN|NameType=Geo|r-nummod",
840
+ "800": "PROPN|NameType=Geo|r-obj",
841
+ "801": "PROPN|NameType=Geo|r-obl",
842
+ "802": "PROPN|NameType=Geo|root",
843
+ "803": "PROPN|NameType=Giv|_",
844
+ "804": "PROPN|NameType=Giv|l-dislocated",
845
+ "805": "PROPN|NameType=Giv|l-nsubj",
846
+ "806": "PROPN|NameType=Giv|l-obl",
847
+ "807": "PROPN|NameType=Giv|r-acl",
848
+ "808": "PROPN|NameType=Giv|r-appos",
849
+ "809": "PROPN|NameType=Giv|r-ccomp",
850
+ "810": "PROPN|NameType=Giv|r-conj",
851
+ "811": "PROPN|NameType=Giv|r-flat",
852
+ "812": "PROPN|NameType=Giv|r-list",
853
+ "813": "PROPN|NameType=Giv|r-nmod",
854
+ "814": "PROPN|NameType=Giv|r-nsubj",
855
+ "815": "PROPN|NameType=Giv|r-obj",
856
+ "816": "PROPN|NameType=Giv|r-obl",
857
+ "817": "PROPN|NameType=Giv|root",
858
+ "818": "PROPN|NameType=Nat|_",
859
+ "819": "PROPN|NameType=Nat|l-csubj",
860
+ "820": "PROPN|NameType=Nat|l-nsubj",
861
+ "821": "PROPN|NameType=Nat|l-obl",
862
+ "822": "PROPN|NameType=Nat|r-acl",
863
+ "823": "PROPN|NameType=Nat|r-appos",
864
+ "824": "PROPN|NameType=Nat|r-compound",
865
+ "825": "PROPN|NameType=Nat|r-conj",
866
+ "826": "PROPN|NameType=Nat|r-flat",
867
+ "827": "PROPN|NameType=Nat|r-list",
868
+ "828": "PROPN|NameType=Nat|r-nmod",
869
+ "829": "PROPN|NameType=Nat|r-nummod",
870
+ "830": "PROPN|NameType=Nat|r-obj",
871
+ "831": "PROPN|NameType=Nat|r-obl",
872
+ "832": "PROPN|NameType=Oth|_",
873
+ "833": "PROPN|NameType=Oth|l-dislocated",
874
+ "834": "PROPN|NameType=Oth|l-nsubj",
875
+ "835": "PROPN|NameType=Oth|r-acl",
876
+ "836": "PROPN|NameType=Oth|r-appos",
877
+ "837": "PROPN|NameType=Oth|r-compound",
878
+ "838": "PROPN|NameType=Oth|r-conj",
879
+ "839": "PROPN|NameType=Oth|r-flat",
880
+ "840": "PROPN|NameType=Oth|r-nmod",
881
+ "841": "PROPN|NameType=Oth|r-obj",
882
+ "842": "PROPN|NameType=Oth|r-obl",
883
+ "843": "PROPN|NameType=Oth|root",
884
+ "844": "PROPN|NameType=Pro|_",
885
+ "845": "PROPN|NameType=Pro|l-nsubj",
886
+ "846": "PROPN|NameType=Pro|l-obl",
887
+ "847": "PROPN|NameType=Pro|r-advcl",
888
+ "848": "PROPN|NameType=Pro|r-flat",
889
+ "849": "PROPN|NameType=Pro|r-nmod",
890
+ "850": "PROPN|NameType=Pro|r-obj",
891
+ "851": "PROPN|NameType=Prs|_",
892
+ "852": "PROPN|NameType=Prs|l-dislocated",
893
+ "853": "PROPN|NameType=Prs|l-nsubj",
894
+ "854": "PROPN|NameType=Prs|l-obl",
895
+ "855": "PROPN|NameType=Prs|l-vocative",
896
+ "856": "PROPN|NameType=Prs|r-conj",
897
+ "857": "PROPN|NameType=Prs|r-discourse",
898
+ "858": "PROPN|NameType=Prs|r-flat",
899
+ "859": "PROPN|NameType=Prs|r-list",
900
+ "860": "PROPN|NameType=Prs|r-nmod",
901
+ "861": "PROPN|NameType=Prs|r-obj",
902
+ "862": "PROPN|NameType=Prs|r-obl",
903
+ "863": "PROPN|NameType=Prs|r-vocative",
904
+ "864": "PROPN|NameType=Sur|_",
905
+ "865": "PROPN|NameType=Sur|l-nsubj",
906
+ "866": "PROPN|NameType=Sur|r-flat",
907
+ "867": "PROPN|NameType=Sur|r-nmod",
908
+ "868": "PROPN|NounType=Class|_",
909
+ "869": "PROPN|NounType=Class|r-clf",
910
+ "870": "PROPN|Prefix=Yes|_",
911
+ "871": "PROPN|Prefix=Yes|l-nsubj",
912
+ "872": "PROPN|Prefix=Yes|r-nmod",
913
+ "873": "PROPN|_",
914
+ "874": "PROPN|l-advmod",
915
+ "875": "PROPN|l-aux",
916
+ "876": "PROPN|l-compound",
917
+ "877": "PROPN|l-nsubj",
918
+ "878": "PROPN|l-nsubj:pass",
919
+ "879": "PROPN|l-obj",
920
+ "880": "PROPN|l-obl",
921
+ "881": "PROPN|l-obl:tmod",
922
+ "882": "PROPN|r-acl",
923
+ "883": "PROPN|r-acl:relcl",
924
+ "884": "PROPN|r-advmod",
925
+ "885": "PROPN|r-appos",
926
+ "886": "PROPN|r-cc",
927
+ "887": "PROPN|r-ccomp",
928
+ "888": "PROPN|r-clf",
929
+ "889": "PROPN|r-compound",
930
+ "890": "PROPN|r-conj",
931
+ "891": "PROPN|r-fixed",
932
+ "892": "PROPN|r-flat",
933
+ "893": "PROPN|r-flat:name",
934
+ "894": "PROPN|r-goeswith",
935
+ "895": "PROPN|r-iobj",
936
+ "896": "PROPN|r-list",
937
+ "897": "PROPN|r-nmod",
938
+ "898": "PROPN|r-nmod:poss",
939
+ "899": "PROPN|r-obj",
940
+ "900": "PROPN|r-obl",
941
+ "901": "PROPN|r-obl:poss",
942
+ "902": "PROPN|r-obl:tmod",
943
+ "903": "PROPN|r-xcomp",
944
+ "904": "PROPN|root",
945
+ "905": "PUNCT",
946
+ "906": "PUNCT.",
947
+ "907": "PUNCT|NounType=Class|_",
948
+ "908": "PUNCT|NounType=Class|r-punct",
949
+ "909": "PUNCT|_",
950
+ "910": "PUNCT|l-advmod",
951
+ "911": "PUNCT|l-dep",
952
+ "912": "PUNCT|l-punct",
953
+ "913": "PUNCT|r-advmod",
954
+ "914": "PUNCT|r-clf",
955
+ "915": "PUNCT|r-dep",
956
+ "916": "PUNCT|r-punct",
957
+ "917": "PUNCT|root",
958
+ "918": "SCONJ",
959
+ "919": "SCONJ.",
960
+ "920": "SCONJ|NumType=Mult|_",
961
+ "921": "SCONJ|NumType=Mult|l-mark",
962
+ "922": "SCONJ|Prefix=Yes|_",
963
+ "923": "SCONJ|Prefix=Yes|l-cc",
964
+ "924": "SCONJ|Prefix=Yes|l-mark",
965
+ "925": "SCONJ|VerbType=Cop|_",
966
+ "926": "SCONJ|VerbType=Cop|l-mark",
967
+ "927": "SCONJ|_",
968
+ "928": "SCONJ|l-advmod",
969
+ "929": "SCONJ|l-case",
970
+ "930": "SCONJ|l-cc",
971
+ "931": "SCONJ|l-discourse",
972
+ "932": "SCONJ|l-mark",
973
+ "933": "SCONJ|l-nsubj",
974
+ "934": "SCONJ|l-orphan",
975
+ "935": "SCONJ|r-advcl",
976
+ "936": "SCONJ|r-compound",
977
+ "937": "SCONJ|r-fixed",
978
+ "938": "SCONJ|r-flat",
979
+ "939": "SCONJ|r-mark",
980
+ "940": "SCONJ|r-orphan",
981
+ "941": "SCONJ|root",
982
+ "942": "SYM",
983
+ "943": "SYM.",
984
+ "944": "SYM|_",
985
+ "945": "SYM|l-dep",
986
+ "946": "SYM|l-nsubj",
987
+ "947": "SYM|r-advmod",
988
+ "948": "SYM|r-clf",
989
+ "949": "SYM|r-nmod",
990
+ "950": "SYM|r-obj",
991
+ "951": "SYM|r-obl",
992
+ "952": "SYM|r-xcomp",
993
+ "953": "VERB",
994
+ "954": "VERB.",
995
+ "955": "VERB|Abbr=Yes|_",
996
+ "956": "VERB|Abbr=Yes|r-acl",
997
+ "957": "VERB|Foreign=Yes|_",
998
+ "958": "VERB|Foreign=Yes|l-nsubj",
999
+ "959": "VERB|Foreign=Yes|r-acl",
1000
+ "960": "VERB|Foreign=Yes|r-advcl",
1001
+ "961": "VERB|Foreign=Yes|r-ccomp",
1002
+ "962": "VERB|Foreign=Yes|r-compound",
1003
+ "963": "VERB|Foreign=Yes|r-conj",
1004
+ "964": "VERB|Foreign=Yes|r-flat",
1005
+ "965": "VERB|Foreign=Yes|r-nmod",
1006
+ "966": "VERB|Foreign=Yes|r-xcomp",
1007
+ "967": "VERB|Foreign=Yes|root",
1008
+ "968": "VERB|Mood=Imp|_",
1009
+ "969": "VERB|Mood=Imp|r-xcomp",
1010
+ "970": "VERB|NounType=Class|_",
1011
+ "971": "VERB|NounType=Class|r-acl",
1012
+ "972": "VERB|NounType=Class|r-compound",
1013
+ "973": "VERB|PartType=Adj|_",
1014
+ "974": "VERB|PartType=Adj|r-acl",
1015
+ "975": "VERB|Prefix=Yes|_",
1016
+ "976": "VERB|Prefix=Yes|l-acl",
1017
+ "977": "VERB|Prefix=Yes|l-nsubj",
1018
+ "978": "VERB|Prefix=Yes|r-acl",
1019
+ "979": "VERB|Prefix=Yes|r-advcl",
1020
+ "980": "VERB|Prefix=Yes|r-ccomp",
1021
+ "981": "VERB|Prefix=Yes|r-compound",
1022
+ "982": "VERB|Prefix=Yes|r-conj",
1023
+ "983": "VERB|Prefix=Yes|r-parataxis",
1024
+ "984": "VERB|Prefix=Yes|root",
1025
+ "985": "VERB|VerbType=Cop|_",
1026
+ "986": "VERB|VerbType=Cop|l-advmod",
1027
+ "987": "VERB|VerbType=Cop|l-cop",
1028
+ "988": "VERB|VerbType=Cop|r-acl",
1029
+ "989": "VERB|VerbType=Cop|r-advcl",
1030
+ "990": "VERB|VerbType=Cop|r-ccomp",
1031
+ "991": "VERB|VerbType=Cop|r-compound",
1032
+ "992": "VERB|VerbType=Cop|r-parataxis",
1033
+ "993": "VERB|VerbType=Cop|root",
1034
+ "994": "VERB|_",
1035
+ "995": "VERB|l-acl",
1036
+ "996": "VERB|l-acl:relcl",
1037
+ "997": "VERB|l-advcl",
1038
+ "998": "VERB|l-advmod",
1039
+ "999": "VERB|l-case",
1040
+ "1000": "VERB|l-cc",
1041
+ "1001": "VERB|l-ccomp",
1042
+ "1002": "VERB|l-compound",
1043
+ "1003": "VERB|l-conj",
1044
+ "1004": "VERB|l-cop",
1045
+ "1005": "VERB|l-csubj",
1046
+ "1006": "VERB|l-discourse",
1047
+ "1007": "VERB|l-dislocated",
1048
+ "1008": "VERB|l-mark",
1049
+ "1009": "VERB|l-nmod",
1050
+ "1010": "VERB|l-nsubj",
1051
+ "1011": "VERB|l-obj",
1052
+ "1012": "VERB|l-obl",
1053
+ "1013": "VERB|l-orphan",
1054
+ "1014": "VERB|l-xcomp",
1055
+ "1015": "VERB|r-acl",
1056
+ "1016": "VERB|r-acl:relcl",
1057
+ "1017": "VERB|r-advcl",
1058
+ "1018": "VERB|r-advmod",
1059
+ "1019": "VERB|r-appos",
1060
+ "1020": "VERB|r-case",
1061
+ "1021": "VERB|r-cc",
1062
+ "1022": "VERB|r-ccomp",
1063
+ "1023": "VERB|r-clf",
1064
+ "1024": "VERB|r-compound",
1065
+ "1025": "VERB|r-conj",
1066
+ "1026": "VERB|r-dep",
1067
+ "1027": "VERB|r-det",
1068
+ "1028": "VERB|r-discourse",
1069
+ "1029": "VERB|r-fixed",
1070
+ "1030": "VERB|r-flat",
1071
+ "1031": "VERB|r-list",
1072
+ "1032": "VERB|r-mark",
1073
+ "1033": "VERB|r-nmod",
1074
+ "1034": "VERB|r-nmod:poss",
1075
+ "1035": "VERB|r-nsubj",
1076
+ "1036": "VERB|r-obj",
1077
+ "1037": "VERB|r-obl",
1078
+ "1038": "VERB|r-obl:poss",
1079
+ "1039": "VERB|r-orphan",
1080
+ "1040": "VERB|r-parataxis",
1081
+ "1041": "VERB|r-punct",
1082
+ "1042": "VERB|r-xcomp",
1083
+ "1043": "VERB|root"
1084
+ },
1085
+ "initializer_cutoff_factor": 2.0,
1086
+ "initializer_range": 0.02,
1087
+ "intermediate_size": 2624,
1088
+ "label2id": {
1089
+ "ADP": 0,
1090
+ "ADP.": 1,
1091
+ "ADP|Foreign=Yes|_": 2,
1092
+ "ADP|Foreign=Yes|l-case": 3,
1093
+ "ADP|NounType=Class|_": 4,
1094
+ "ADP|NounType=Class|l-case": 5,
1095
+ "ADP|Prefix=Yes|_": 6,
1096
+ "ADP|Prefix=Yes|l-case": 7,
1097
+ "ADP|Prefix=Yes|l-mark": 8,
1098
+ "ADP|_": 9,
1099
+ "ADP|l-acl": 10,
1100
+ "ADP|l-advcl": 11,
1101
+ "ADP|l-advmod": 12,
1102
+ "ADP|l-case": 13,
1103
+ "ADP|l-cc": 14,
1104
+ "ADP|l-cc:preconj": 15,
1105
+ "ADP|l-csubj": 16,
1106
+ "ADP|l-dep": 17,
1107
+ "ADP|l-fixed": 18,
1108
+ "ADP|l-flat": 19,
1109
+ "ADP|l-mark": 20,
1110
+ "ADP|l-nmod": 21,
1111
+ "ADP|l-nsubj": 22,
1112
+ "ADP|l-obl": 23,
1113
+ "ADP|l-orphan": 24,
1114
+ "ADP|r-acl": 25,
1115
+ "ADP|r-advmod": 26,
1116
+ "ADP|r-appos": 27,
1117
+ "ADP|r-case": 28,
1118
+ "ADP|r-compound": 29,
1119
+ "ADP|r-conj": 30,
1120
+ "ADP|r-fixed": 31,
1121
+ "ADP|r-flat": 32,
1122
+ "ADP|r-mark": 33,
1123
+ "ADP|r-obl": 34,
1124
+ "ADP|r-orphan": 35,
1125
+ "ADP|root": 36,
1126
+ "ADV": 37,
1127
+ "ADV.": 38,
1128
+ "ADV|Foreign=Yes|_": 39,
1129
+ "ADV|Foreign=Yes|l-advmod": 40,
1130
+ "ADV|Foreign=Yes|r-advmod": 41,
1131
+ "ADV|NumType=Mult|_": 42,
1132
+ "ADV|NumType=Mult|r-advmod": 43,
1133
+ "ADV|PartType=Adv|_": 44,
1134
+ "ADV|PartType=Adv|l-advmod": 45,
1135
+ "ADV|PartType=Adv|l-mark": 46,
1136
+ "ADV|PartType=Adv|r-advmod": 47,
1137
+ "ADV|PartType=Enp|_": 48,
1138
+ "ADV|PartType=Enp|l-advmod": 49,
1139
+ "ADV|PartType=Enp|r-advmod": 50,
1140
+ "ADV|PartType=Int|_": 51,
1141
+ "ADV|PartType=Int|r-advmod": 52,
1142
+ "ADV|PartType=Int|r-fixed": 53,
1143
+ "ADV|Prefix=Yes|_": 54,
1144
+ "ADV|Prefix=Yes|l-advmod": 55,
1145
+ "ADV|Prefix=Yes|l-mark": 56,
1146
+ "ADV|Prefix=Yes|r-advmod": 57,
1147
+ "ADV|PronType=Int|_": 58,
1148
+ "ADV|PronType=Int|l-advmod": 59,
1149
+ "ADV|PronType=Int|r-advmod": 60,
1150
+ "ADV|_": 61,
1151
+ "ADV|l-acl": 62,
1152
+ "ADV|l-advcl": 63,
1153
+ "ADV|l-advmod": 64,
1154
+ "ADV|l-aux": 65,
1155
+ "ADV|l-case": 66,
1156
+ "ADV|l-cc": 67,
1157
+ "ADV|l-compound": 68,
1158
+ "ADV|l-dep": 69,
1159
+ "ADV|l-det": 70,
1160
+ "ADV|l-discourse": 71,
1161
+ "ADV|l-fixed": 72,
1162
+ "ADV|l-mark": 73,
1163
+ "ADV|l-orphan": 74,
1164
+ "ADV|l-xcomp": 75,
1165
+ "ADV|r-acl": 76,
1166
+ "ADV|r-advcl": 77,
1167
+ "ADV|r-advmod": 78,
1168
+ "ADV|r-aux": 79,
1169
+ "ADV|r-ccomp": 80,
1170
+ "ADV|r-compound": 81,
1171
+ "ADV|r-conj": 82,
1172
+ "ADV|r-det": 83,
1173
+ "ADV|r-fixed": 84,
1174
+ "ADV|r-flat": 85,
1175
+ "ADV|r-mark": 86,
1176
+ "ADV|r-nmod": 87,
1177
+ "ADV|r-obj": 88,
1178
+ "ADV|r-orphan": 89,
1179
+ "ADV|r-xcomp": 90,
1180
+ "ADV|root": 91,
1181
+ "AUX": 92,
1182
+ "AUX.": 93,
1183
+ "AUX|Foreign=Yes|_": 94,
1184
+ "AUX|Foreign=Yes|l-aux": 95,
1185
+ "AUX|Mood=Imp|_": 96,
1186
+ "AUX|Mood=Imp|l-aux": 97,
1187
+ "AUX|NounType=Class|_": 98,
1188
+ "AUX|NounType=Class|r-appos": 99,
1189
+ "AUX|Prefix=Yes|_": 100,
1190
+ "AUX|Prefix=Yes|l-aux": 101,
1191
+ "AUX|Prefix=Yes|r-aux": 102,
1192
+ "AUX|VerbType=Cop|_": 103,
1193
+ "AUX|VerbType=Cop|l-acl": 104,
1194
+ "AUX|VerbType=Cop|l-advcl": 105,
1195
+ "AUX|VerbType=Cop|l-aux": 106,
1196
+ "AUX|VerbType=Cop|l-cop": 107,
1197
+ "AUX|VerbType=Cop|r-acl": 108,
1198
+ "AUX|VerbType=Cop|r-advcl": 109,
1199
+ "AUX|VerbType=Cop|r-aux": 110,
1200
+ "AUX|VerbType=Cop|r-conj": 111,
1201
+ "AUX|VerbType=Cop|r-mark": 112,
1202
+ "AUX|VerbType=Cop|root": 113,
1203
+ "AUX|Voice=Pass|_": 114,
1204
+ "AUX|Voice=Pass|l-aux": 115,
1205
+ "AUX|Voice=Pass|l-aux:pass": 116,
1206
+ "AUX|Voice=Pass|r-aux:pass": 117,
1207
+ "AUX|_": 118,
1208
+ "AUX|l-advmod": 119,
1209
+ "AUX|l-aux": 120,
1210
+ "AUX|l-aux:pass": 121,
1211
+ "AUX|l-cop": 122,
1212
+ "AUX|l-mark": 123,
1213
+ "AUX|r-acl": 124,
1214
+ "AUX|r-advmod": 125,
1215
+ "AUX|r-aux": 126,
1216
+ "AUX|r-ccomp": 127,
1217
+ "AUX|r-clf": 128,
1218
+ "AUX|r-compound": 129,
1219
+ "AUX|r-conj": 130,
1220
+ "AUX|r-fixed": 131,
1221
+ "AUX|r-mark": 132,
1222
+ "AUX|root": 133,
1223
+ "B-ADP": 134,
1224
+ "B-ADP.": 135,
1225
+ "B-ADV": 136,
1226
+ "B-ADV.": 137,
1227
+ "B-AUX": 138,
1228
+ "B-AUX.": 139,
1229
+ "B-CCONJ": 140,
1230
+ "B-CCONJ.": 141,
1231
+ "B-DET": 142,
1232
+ "B-DET.": 143,
1233
+ "B-INTJ": 144,
1234
+ "B-INTJ.": 145,
1235
+ "B-NOUN": 146,
1236
+ "B-NOUN.": 147,
1237
+ "B-NUM": 148,
1238
+ "B-NUM.": 149,
1239
+ "B-PART": 150,
1240
+ "B-PART.": 151,
1241
+ "B-PRON": 152,
1242
+ "B-PRON.": 153,
1243
+ "B-PROPN": 154,
1244
+ "B-PROPN.": 155,
1245
+ "B-PUNCT": 156,
1246
+ "B-PUNCT.": 157,
1247
+ "B-SCONJ": 158,
1248
+ "B-SCONJ.": 159,
1249
+ "B-SYM": 160,
1250
+ "B-SYM.": 161,
1251
+ "B-VERB": 162,
1252
+ "B-VERB.": 163,
1253
+ "CCONJ": 164,
1254
+ "CCONJ.": 165,
1255
+ "CCONJ|Foreign=Yes|_": 166,
1256
+ "CCONJ|Foreign=Yes|l-cc": 167,
1257
+ "CCONJ|PronType=Prs|_": 168,
1258
+ "CCONJ|PronType=Prs|l-cc": 169,
1259
+ "CCONJ|_": 170,
1260
+ "CCONJ|l-advmod": 171,
1261
+ "CCONJ|l-case": 172,
1262
+ "CCONJ|l-cc": 173,
1263
+ "CCONJ|l-conj": 174,
1264
+ "CCONJ|l-discourse": 175,
1265
+ "CCONJ|l-fixed": 176,
1266
+ "CCONJ|l-flat": 177,
1267
+ "CCONJ|l-mark": 178,
1268
+ "CCONJ|l-nsubj": 179,
1269
+ "CCONJ|l-obj": 180,
1270
+ "CCONJ|l-orphan": 181,
1271
+ "CCONJ|r-cc": 182,
1272
+ "CCONJ|r-compound": 183,
1273
+ "CCONJ|r-conj": 184,
1274
+ "CCONJ|r-fixed": 185,
1275
+ "CCONJ|r-mark": 186,
1276
+ "CCONJ|r-obl": 187,
1277
+ "CCONJ|root": 188,
1278
+ "DET": 189,
1279
+ "DET.": 190,
1280
+ "DET|NumType=Mult|_": 191,
1281
+ "DET|NumType=Mult|l-det": 192,
1282
+ "DET|PartType=Emp|_": 193,
1283
+ "DET|PartType=Emp|r-det": 194,
1284
+ "DET|PartType=Int|_": 195,
1285
+ "DET|PartType=Int|r-det": 196,
1286
+ "DET|PronType=Int|_": 197,
1287
+ "DET|PronType=Int|r-det": 198,
1288
+ "DET|_": 199,
1289
+ "DET|l-advmod": 200,
1290
+ "DET|l-case": 201,
1291
+ "DET|l-cc:preconj": 202,
1292
+ "DET|l-compound": 203,
1293
+ "DET|l-det": 204,
1294
+ "DET|l-det:predet": 205,
1295
+ "DET|l-discourse": 206,
1296
+ "DET|l-mark": 207,
1297
+ "DET|l-nsubj": 208,
1298
+ "DET|l-nsubj:pass": 209,
1299
+ "DET|l-obj": 210,
1300
+ "DET|l-obl": 211,
1301
+ "DET|l-obl:tmod": 212,
1302
+ "DET|l-orphan": 213,
1303
+ "DET|r-advmod": 214,
1304
+ "DET|r-compound": 215,
1305
+ "DET|r-conj": 216,
1306
+ "DET|r-dep": 217,
1307
+ "DET|r-det": 218,
1308
+ "DET|r-fixed": 219,
1309
+ "DET|r-flat": 220,
1310
+ "DET|r-list": 221,
1311
+ "DET|r-nmod": 222,
1312
+ "DET|r-nummod": 223,
1313
+ "DET|r-obj": 224,
1314
+ "DET|r-obl": 225,
1315
+ "DET|r-orphan": 226,
1316
+ "DET|root": 227,
1317
+ "I-ADP": 228,
1318
+ "I-ADP.": 229,
1319
+ "I-ADV": 230,
1320
+ "I-ADV.": 231,
1321
+ "I-AUX": 232,
1322
+ "I-AUX.": 233,
1323
+ "I-CCONJ": 234,
1324
+ "I-CCONJ.": 235,
1325
+ "I-DET": 236,
1326
+ "I-DET.": 237,
1327
+ "I-INTJ": 238,
1328
+ "I-INTJ.": 239,
1329
+ "I-NOUN": 240,
1330
+ "I-NOUN.": 241,
1331
+ "I-NUM": 242,
1332
+ "I-NUM.": 243,
1333
+ "I-PART": 244,
1334
+ "I-PART.": 245,
1335
+ "I-PRON": 246,
1336
+ "I-PRON.": 247,
1337
+ "I-PROPN": 248,
1338
+ "I-PROPN.": 249,
1339
+ "I-PUNCT": 250,
1340
+ "I-PUNCT.": 251,
1341
+ "I-SCONJ": 252,
1342
+ "I-SCONJ.": 253,
1343
+ "I-SYM": 254,
1344
+ "I-SYM.": 255,
1345
+ "I-VERB": 256,
1346
+ "I-VERB.": 257,
1347
+ "INTJ": 258,
1348
+ "INTJ.": 259,
1349
+ "INTJ|_": 260,
1350
+ "INTJ|l-nsubj": 261,
1351
+ "INTJ|r-acl": 262,
1352
+ "INTJ|root": 263,
1353
+ "NOUN": 264,
1354
+ "NOUN.": 265,
1355
+ "NOUN|Abbr=Yes|Foreign=Yes|_": 266,
1356
+ "NOUN|Abbr=Yes|Foreign=Yes|r-nmod": 267,
1357
+ "NOUN|Abbr=Yes|Prefix=Yes|_": 268,
1358
+ "NOUN|Abbr=Yes|Prefix=Yes|l-flat": 269,
1359
+ "NOUN|Abbr=Yes|_": 270,
1360
+ "NOUN|Abbr=Yes|l-flat": 271,
1361
+ "NOUN|Abbr=Yes|l-nmod": 272,
1362
+ "NOUN|Abbr=Yes|l-nsubj": 273,
1363
+ "NOUN|Abbr=Yes|l-obl": 274,
1364
+ "NOUN|Abbr=Yes|r-acl": 275,
1365
+ "NOUN|Abbr=Yes|r-appos": 276,
1366
+ "NOUN|Abbr=Yes|r-clf": 277,
1367
+ "NOUN|Abbr=Yes|r-conj": 278,
1368
+ "NOUN|Abbr=Yes|r-fixed": 279,
1369
+ "NOUN|Abbr=Yes|r-flat": 280,
1370
+ "NOUN|Abbr=Yes|r-nmod": 281,
1371
+ "NOUN|Abbr=Yes|r-obj": 282,
1372
+ "NOUN|Abbr=Yes|r-obl": 283,
1373
+ "NOUN|Foreign=Yes|NounType=Class|_": 284,
1374
+ "NOUN|Foreign=Yes|NounType=Class|r-clf": 285,
1375
+ "NOUN|Foreign=Yes|NounType=Class|r-obj": 286,
1376
+ "NOUN|Foreign=Yes|Prefix=Yes|_": 287,
1377
+ "NOUN|Foreign=Yes|Prefix=Yes|l-flat": 288,
1378
+ "NOUN|Foreign=Yes|Prefix=Yes|r-appos": 289,
1379
+ "NOUN|Foreign=Yes|_": 290,
1380
+ "NOUN|Foreign=Yes|l-dislocated": 291,
1381
+ "NOUN|Foreign=Yes|l-flat": 292,
1382
+ "NOUN|Foreign=Yes|l-nmod": 293,
1383
+ "NOUN|Foreign=Yes|l-nsubj": 294,
1384
+ "NOUN|Foreign=Yes|l-obl": 295,
1385
+ "NOUN|Foreign=Yes|r-acl": 296,
1386
+ "NOUN|Foreign=Yes|r-advcl": 297,
1387
+ "NOUN|Foreign=Yes|r-advmod": 298,
1388
+ "NOUN|Foreign=Yes|r-appos": 299,
1389
+ "NOUN|Foreign=Yes|r-ccomp": 300,
1390
+ "NOUN|Foreign=Yes|r-clf": 301,
1391
+ "NOUN|Foreign=Yes|r-compound": 302,
1392
+ "NOUN|Foreign=Yes|r-conj": 303,
1393
+ "NOUN|Foreign=Yes|r-flat": 304,
1394
+ "NOUN|Foreign=Yes|r-iobj": 305,
1395
+ "NOUN|Foreign=Yes|r-list": 306,
1396
+ "NOUN|Foreign=Yes|r-nmod": 307,
1397
+ "NOUN|Foreign=Yes|r-obj": 308,
1398
+ "NOUN|Foreign=Yes|r-obl": 309,
1399
+ "NOUN|Foreign=Yes|r-xcomp": 310,
1400
+ "NOUN|Foreign=Yes|root": 311,
1401
+ "NOUN|NameType=Com|_": 312,
1402
+ "NOUN|NameType=Com|r-nmod": 313,
1403
+ "NOUN|NameType=Geo|_": 314,
1404
+ "NOUN|NameType=Geo|l-nsubj": 315,
1405
+ "NOUN|NameType=Geo|r-nmod": 316,
1406
+ "NOUN|NameType=Geo|r-obj": 317,
1407
+ "NOUN|NameType=Nat|_": 318,
1408
+ "NOUN|NameType=Nat|r-nmod": 319,
1409
+ "NOUN|NameType=Oth|_": 320,
1410
+ "NOUN|NameType=Oth|l-nsubj": 321,
1411
+ "NOUN|NameType=Oth|r-conj": 322,
1412
+ "NOUN|NameType=Oth|r-flat": 323,
1413
+ "NOUN|NameType=Oth|r-nmod": 324,
1414
+ "NOUN|NameType=Pro|_": 325,
1415
+ "NOUN|NameType=Pro|r-nmod": 326,
1416
+ "NOUN|NameType=Prs|_": 327,
1417
+ "NOUN|NameType=Prs|l-nsubj": 328,
1418
+ "NOUN|NameType=Prs|r-nmod": 329,
1419
+ "NOUN|NounType=Class|Prefix=Yes|_": 330,
1420
+ "NOUN|NounType=Class|Prefix=Yes|l-advcl": 331,
1421
+ "NOUN|NounType=Class|Prefix=Yes|l-advmod": 332,
1422
+ "NOUN|NounType=Class|Prefix=Yes|l-mark": 333,
1423
+ "NOUN|NounType=Class|Prefix=Yes|l-nmod": 334,
1424
+ "NOUN|NounType=Class|Prefix=Yes|l-nsubj": 335,
1425
+ "NOUN|NounType=Class|Prefix=Yes|r-advcl": 336,
1426
+ "NOUN|NounType=Class|Prefix=Yes|r-clf": 337,
1427
+ "NOUN|NounType=Class|Prefix=Yes|r-nmod": 338,
1428
+ "NOUN|NounType=Class|Prefix=Yes|r-obj": 339,
1429
+ "NOUN|NounType=Class|_": 340,
1430
+ "NOUN|NounType=Class|l-advcl": 341,
1431
+ "NOUN|NounType=Class|l-advmod": 342,
1432
+ "NOUN|NounType=Class|l-clf": 343,
1433
+ "NOUN|NounType=Class|l-dislocated": 344,
1434
+ "NOUN|NounType=Class|l-nmod": 345,
1435
+ "NOUN|NounType=Class|l-nsubj": 346,
1436
+ "NOUN|NounType=Class|l-obj": 347,
1437
+ "NOUN|NounType=Class|l-obl": 348,
1438
+ "NOUN|NounType=Class|r-acl": 349,
1439
+ "NOUN|NounType=Class|r-advcl": 350,
1440
+ "NOUN|NounType=Class|r-advmod": 351,
1441
+ "NOUN|NounType=Class|r-appos": 352,
1442
+ "NOUN|NounType=Class|r-cc": 353,
1443
+ "NOUN|NounType=Class|r-ccomp": 354,
1444
+ "NOUN|NounType=Class|r-clf": 355,
1445
+ "NOUN|NounType=Class|r-compound": 356,
1446
+ "NOUN|NounType=Class|r-conj": 357,
1447
+ "NOUN|NounType=Class|r-dislocated": 358,
1448
+ "NOUN|NounType=Class|r-fixed": 359,
1449
+ "NOUN|NounType=Class|r-flat": 360,
1450
+ "NOUN|NounType=Class|r-iobj": 361,
1451
+ "NOUN|NounType=Class|r-list": 362,
1452
+ "NOUN|NounType=Class|r-nmod": 363,
1453
+ "NOUN|NounType=Class|r-nummod": 364,
1454
+ "NOUN|NounType=Class|r-obj": 365,
1455
+ "NOUN|NounType=Class|r-obl": 366,
1456
+ "NOUN|NounType=Class|r-orphan": 367,
1457
+ "NOUN|NounType=Class|r-xcomp": 368,
1458
+ "NOUN|NounType=Class|root": 369,
1459
+ "NOUN|NumType=Mult|_": 370,
1460
+ "NOUN|NumType=Mult|r-advcl": 371,
1461
+ "NOUN|NumType=Mult|r-nmod": 372,
1462
+ "NOUN|NumType=Mult|r-obj": 373,
1463
+ "NOUN|PartType=Enp|_": 374,
1464
+ "NOUN|PartType=Enp|r-obj": 375,
1465
+ "NOUN|PartType=Enp|r-obl": 376,
1466
+ "NOUN|PartType=Int|_": 377,
1467
+ "NOUN|PartType=Int|r-obj": 378,
1468
+ "NOUN|PartType=Res|_": 379,
1469
+ "NOUN|PartType=Res|r-nmod": 380,
1470
+ "NOUN|PartType=Res|r-obj": 381,
1471
+ "NOUN|Prefix=Yes|_": 382,
1472
+ "NOUN|Prefix=Yes|l-acl": 383,
1473
+ "NOUN|Prefix=Yes|l-advcl": 384,
1474
+ "NOUN|Prefix=Yes|l-clf": 385,
1475
+ "NOUN|Prefix=Yes|l-csubj": 386,
1476
+ "NOUN|Prefix=Yes|l-dislocated": 387,
1477
+ "NOUN|Prefix=Yes|l-flat": 388,
1478
+ "NOUN|Prefix=Yes|l-nmod": 389,
1479
+ "NOUN|Prefix=Yes|l-nsubj": 390,
1480
+ "NOUN|Prefix=Yes|l-obj": 391,
1481
+ "NOUN|Prefix=Yes|l-obl": 392,
1482
+ "NOUN|Prefix=Yes|r-acl": 393,
1483
+ "NOUN|Prefix=Yes|r-advcl": 394,
1484
+ "NOUN|Prefix=Yes|r-advmod": 395,
1485
+ "NOUN|Prefix=Yes|r-appos": 396,
1486
+ "NOUN|Prefix=Yes|r-case": 397,
1487
+ "NOUN|Prefix=Yes|r-cc": 398,
1488
+ "NOUN|Prefix=Yes|r-ccomp": 399,
1489
+ "NOUN|Prefix=Yes|r-clf": 400,
1490
+ "NOUN|Prefix=Yes|r-compound": 401,
1491
+ "NOUN|Prefix=Yes|r-conj": 402,
1492
+ "NOUN|Prefix=Yes|r-dislocated": 403,
1493
+ "NOUN|Prefix=Yes|r-fixed": 404,
1494
+ "NOUN|Prefix=Yes|r-flat": 405,
1495
+ "NOUN|Prefix=Yes|r-iobj": 406,
1496
+ "NOUN|Prefix=Yes|r-list": 407,
1497
+ "NOUN|Prefix=Yes|r-nmod": 408,
1498
+ "NOUN|Prefix=Yes|r-nummod": 409,
1499
+ "NOUN|Prefix=Yes|r-obj": 410,
1500
+ "NOUN|Prefix=Yes|r-obl": 411,
1501
+ "NOUN|Prefix=Yes|r-orphan": 412,
1502
+ "NOUN|Prefix=Yes|r-xcomp": 413,
1503
+ "NOUN|Prefix=Yes|root": 414,
1504
+ "NOUN|_": 415,
1505
+ "NOUN|l-acl": 416,
1506
+ "NOUN|l-advcl": 417,
1507
+ "NOUN|l-advmod": 418,
1508
+ "NOUN|l-aux": 419,
1509
+ "NOUN|l-case": 420,
1510
+ "NOUN|l-cc": 421,
1511
+ "NOUN|l-ccomp": 422,
1512
+ "NOUN|l-compound": 423,
1513
+ "NOUN|l-csubj": 424,
1514
+ "NOUN|l-discourse": 425,
1515
+ "NOUN|l-dislocated": 426,
1516
+ "NOUN|l-expl": 427,
1517
+ "NOUN|l-flat": 428,
1518
+ "NOUN|l-iobj": 429,
1519
+ "NOUN|l-mark": 430,
1520
+ "NOUN|l-nmod": 431,
1521
+ "NOUN|l-nsubj": 432,
1522
+ "NOUN|l-nsubj:pass": 433,
1523
+ "NOUN|l-nummod": 434,
1524
+ "NOUN|l-obj": 435,
1525
+ "NOUN|l-obl": 436,
1526
+ "NOUN|l-obl:tmod": 437,
1527
+ "NOUN|l-orphan": 438,
1528
+ "NOUN|l-vocative": 439,
1529
+ "NOUN|r-acl": 440,
1530
+ "NOUN|r-acl:relcl": 441,
1531
+ "NOUN|r-advcl": 442,
1532
+ "NOUN|r-advmod": 443,
1533
+ "NOUN|r-appos": 444,
1534
+ "NOUN|r-case": 445,
1535
+ "NOUN|r-cc": 446,
1536
+ "NOUN|r-ccomp": 447,
1537
+ "NOUN|r-clf": 448,
1538
+ "NOUN|r-compound": 449,
1539
+ "NOUN|r-conj": 450,
1540
+ "NOUN|r-cop": 451,
1541
+ "NOUN|r-discourse": 452,
1542
+ "NOUN|r-dislocated": 453,
1543
+ "NOUN|r-fixed": 454,
1544
+ "NOUN|r-flat": 455,
1545
+ "NOUN|r-flat:name": 456,
1546
+ "NOUN|r-iobj": 457,
1547
+ "NOUN|r-list": 458,
1548
+ "NOUN|r-mark": 459,
1549
+ "NOUN|r-nmod": 460,
1550
+ "NOUN|r-nmod:poss": 461,
1551
+ "NOUN|r-nsubj": 462,
1552
+ "NOUN|r-nummod": 463,
1553
+ "NOUN|r-obj": 464,
1554
+ "NOUN|r-obl": 465,
1555
+ "NOUN|r-obl:poss": 466,
1556
+ "NOUN|r-obl:tmod": 467,
1557
+ "NOUN|r-orphan": 468,
1558
+ "NOUN|r-parataxis": 469,
1559
+ "NOUN|r-xcomp": 470,
1560
+ "NOUN|root": 471,
1561
+ "NUM": 472,
1562
+ "NUM.": 473,
1563
+ "NUM|Abbr=Yes|_": 474,
1564
+ "NUM|Abbr=Yes|r-flat": 475,
1565
+ "NUM|Abbr=Yes|r-nummod": 476,
1566
+ "NUM|Abbr=Yes|r-obj": 477,
1567
+ "NUM|Foreign=Yes|_": 478,
1568
+ "NUM|Foreign=Yes|r-clf": 479,
1569
+ "NUM|NumType=Mult|_": 480,
1570
+ "NUM|NumType=Mult|l-advmod": 481,
1571
+ "NUM|NumType=Mult|l-nummod": 482,
1572
+ "NUM|NumType=Mult|r-advmod": 483,
1573
+ "NUM|Prefix=Yes|_": 484,
1574
+ "NUM|Prefix=Yes|l-nummod": 485,
1575
+ "NUM|_": 486,
1576
+ "NUM|l-advcl": 487,
1577
+ "NUM|l-advmod": 488,
1578
+ "NUM|l-case": 489,
1579
+ "NUM|l-clf": 490,
1580
+ "NUM|l-dep": 491,
1581
+ "NUM|l-flat": 492,
1582
+ "NUM|l-nmod": 493,
1583
+ "NUM|l-nsubj": 494,
1584
+ "NUM|l-nummod": 495,
1585
+ "NUM|l-obl": 496,
1586
+ "NUM|l-obl:tmod": 497,
1587
+ "NUM|r-acl": 498,
1588
+ "NUM|r-acl:relcl": 499,
1589
+ "NUM|r-advmod": 500,
1590
+ "NUM|r-appos": 501,
1591
+ "NUM|r-ccomp": 502,
1592
+ "NUM|r-clf": 503,
1593
+ "NUM|r-compound": 504,
1594
+ "NUM|r-conj": 505,
1595
+ "NUM|r-det": 506,
1596
+ "NUM|r-fixed": 507,
1597
+ "NUM|r-flat": 508,
1598
+ "NUM|r-flat:name": 509,
1599
+ "NUM|r-iobj": 510,
1600
+ "NUM|r-nmod": 511,
1601
+ "NUM|r-nummod": 512,
1602
+ "NUM|r-obj": 513,
1603
+ "NUM|r-obl": 514,
1604
+ "NUM|r-obl:poss": 515,
1605
+ "NUM|r-obl:tmod": 516,
1606
+ "NUM|r-xcomp": 517,
1607
+ "NUM|root": 518,
1608
+ "PART": 519,
1609
+ "PART.": 520,
1610
+ "PART|Aspect=Perf|_": 521,
1611
+ "PART|Aspect=Perf|l-aux": 522,
1612
+ "PART|Aspect=Perf|r-aux": 523,
1613
+ "PART|Aspect=Perf|r-xcomp": 524,
1614
+ "PART|Aspect=Prog|_": 525,
1615
+ "PART|Aspect=Prog|l-aux": 526,
1616
+ "PART|Aspect=Prog|r-aux": 527,
1617
+ "PART|NameType=Oth|_": 528,
1618
+ "PART|NameType=Oth|l-advmod": 529,
1619
+ "PART|NounType=Class|PartType=Emp|Prefix=Yes|_": 530,
1620
+ "PART|NounType=Class|PartType=Emp|Prefix=Yes|l-mark": 531,
1621
+ "PART|NounType=Class|PartType=Emp|_": 532,
1622
+ "PART|NounType=Class|PartType=Emp|l-mark": 533,
1623
+ "PART|NounType=Class|Prefix=Yes|_": 534,
1624
+ "PART|NounType=Class|Prefix=Yes|l-mark": 535,
1625
+ "PART|NumType=Mult|PartType=Emp|_": 536,
1626
+ "PART|NumType=Mult|PartType=Emp|l-mark": 537,
1627
+ "PART|PartType=Adj|_": 538,
1628
+ "PART|PartType=Adj|l-mark": 539,
1629
+ "PART|PartType=Adj|l-orphan": 540,
1630
+ "PART|PartType=Adj|r-acl": 541,
1631
+ "PART|PartType=Adj|r-compound": 542,
1632
+ "PART|PartType=Adj|r-nmod": 543,
1633
+ "PART|PartType=Adv|_": 544,
1634
+ "PART|PartType=Adv|l-advmod": 545,
1635
+ "PART|PartType=Adv|l-mark": 546,
1636
+ "PART|PartType=Adv|r-advmod": 547,
1637
+ "PART|PartType=Emp|Prefix=Yes|_": 548,
1638
+ "PART|PartType=Emp|Prefix=Yes|l-advmod": 549,
1639
+ "PART|PartType=Emp|Prefix=Yes|l-aux": 550,
1640
+ "PART|PartType=Emp|Prefix=Yes|l-mark": 551,
1641
+ "PART|PartType=Emp|_": 552,
1642
+ "PART|PartType=Emp|l-advmod": 553,
1643
+ "PART|PartType=Emp|l-case": 554,
1644
+ "PART|PartType=Emp|l-discourse": 555,
1645
+ "PART|PartType=Emp|l-mark": 556,
1646
+ "PART|PartType=Emp|r-acl": 557,
1647
+ "PART|PartType=Emp|r-advmod": 558,
1648
+ "PART|PartType=Emp|r-aux": 559,
1649
+ "PART|PartType=Emp|r-compound": 560,
1650
+ "PART|PartType=Emp|r-det": 561,
1651
+ "PART|PartType=Emp|r-fixed": 562,
1652
+ "PART|PartType=Emp|r-mark": 563,
1653
+ "PART|PartType=Emp|r-nmod": 564,
1654
+ "PART|PartType=Enp|_": 565,
1655
+ "PART|PartType=Enp|l-discourse": 566,
1656
+ "PART|PartType=Enp|r-acl": 567,
1657
+ "PART|PartType=Enp|r-advmod": 568,
1658
+ "PART|PartType=Enp|r-compound": 569,
1659
+ "PART|PartType=Enp|r-dep": 570,
1660
+ "PART|PartType=Enp|r-det": 571,
1661
+ "PART|PartType=Enp|r-discourse": 572,
1662
+ "PART|PartType=Enp|r-fixed": 573,
1663
+ "PART|PartType=Enp|r-obl": 574,
1664
+ "PART|PartType=Int|_": 575,
1665
+ "PART|PartType=Int|l-advmod": 576,
1666
+ "PART|PartType=Int|l-mark": 577,
1667
+ "PART|PartType=Int|r-acl": 578,
1668
+ "PART|PartType=Int|r-advmod": 579,
1669
+ "PART|PartType=Int|r-dep": 580,
1670
+ "PART|PartType=Int|r-discourse": 581,
1671
+ "PART|PartType=Int|r-nmod": 582,
1672
+ "PART|PartType=Int|r-obj": 583,
1673
+ "PART|PartType=Int|r-obl": 584,
1674
+ "PART|PartType=Neg|_": 585,
1675
+ "PART|PartType=Neg|l-advcl": 586,
1676
+ "PART|PartType=Neg|l-advmod": 587,
1677
+ "PART|PartType=Neg|l-aux": 588,
1678
+ "PART|PartType=Neg|l-mark": 589,
1679
+ "PART|PartType=Neg|r-acl": 590,
1680
+ "PART|PartType=Neg|r-advmod": 591,
1681
+ "PART|PartType=Neg|r-fixed": 592,
1682
+ "PART|PartType=Res|_": 593,
1683
+ "PART|PartType=Res|r-advmod": 594,
1684
+ "PART|PartType=Res|r-discourse": 595,
1685
+ "PART|PartType=Res|r-fixed": 596,
1686
+ "PART|Polarity=Neg|_": 597,
1687
+ "PART|Polarity=Neg|l-advmod": 598,
1688
+ "PART|Prefix=Yes|_": 599,
1689
+ "PART|Prefix=Yes|l-advmod": 600,
1690
+ "PART|Prefix=Yes|l-aux": 601,
1691
+ "PART|Prefix=Yes|l-mark": 602,
1692
+ "PART|Prefix=Yes|r-acl": 603,
1693
+ "PART|Prefix=Yes|r-nmod": 604,
1694
+ "PART|PronType=Int|_": 605,
1695
+ "PART|PronType=Int|r-acl": 606,
1696
+ "PART|PronType=Int|r-advmod": 607,
1697
+ "PART|PronType=Int|r-discourse": 608,
1698
+ "PART|PronType=Int|r-obj": 609,
1699
+ "PART|PronType=Int|root": 610,
1700
+ "PART|_": 611,
1701
+ "PART|l-advmod": 612,
1702
+ "PART|l-cc": 613,
1703
+ "PART|l-cc:preconj": 614,
1704
+ "PART|l-discourse": 615,
1705
+ "PART|l-mark": 616,
1706
+ "PART|l-nsubj": 617,
1707
+ "PART|r-acl": 618,
1708
+ "PART|r-advmod": 619,
1709
+ "PART|r-aux": 620,
1710
+ "PART|r-ccomp": 621,
1711
+ "PART|r-clf": 622,
1712
+ "PART|r-compound": 623,
1713
+ "PART|r-compound:prt": 624,
1714
+ "PART|r-conj": 625,
1715
+ "PART|r-discourse": 626,
1716
+ "PART|r-fixed": 627,
1717
+ "PART|r-mark": 628,
1718
+ "PART|r-nmod": 629,
1719
+ "PART|r-nmod:poss": 630,
1720
+ "PART|r-obj": 631,
1721
+ "PART|r-obl": 632,
1722
+ "PART|root": 633,
1723
+ "PRON": 634,
1724
+ "PRON.": 635,
1725
+ "PRON|NounType=Class|_": 636,
1726
+ "PRON|NounType=Class|r-clf": 637,
1727
+ "PRON|Person=1|_": 638,
1728
+ "PRON|Person=1|l-nsubj": 639,
1729
+ "PRON|Person=1|l-nsubj:pass": 640,
1730
+ "PRON|Person=1|r-compound": 641,
1731
+ "PRON|Person=1|r-nmod:poss": 642,
1732
+ "PRON|Person=1|r-obj": 643,
1733
+ "PRON|Person=1|r-obl": 644,
1734
+ "PRON|Person=1|r-obl:poss": 645,
1735
+ "PRON|Person=2|_": 646,
1736
+ "PRON|Person=2|l-nsubj": 647,
1737
+ "PRON|Person=2|r-compound": 648,
1738
+ "PRON|Person=2|r-nmod:poss": 649,
1739
+ "PRON|Person=2|r-obj": 650,
1740
+ "PRON|Person=2|r-obl": 651,
1741
+ "PRON|Person=3|_": 652,
1742
+ "PRON|Person=3|l-advmod": 653,
1743
+ "PRON|Person=3|l-nsubj": 654,
1744
+ "PRON|Person=3|l-nsubj:pass": 655,
1745
+ "PRON|Person=3|l-reparandum": 656,
1746
+ "PRON|Person=3|r-appos": 657,
1747
+ "PRON|Person=3|r-compound": 658,
1748
+ "PRON|Person=3|r-conj": 659,
1749
+ "PRON|Person=3|r-nmod": 660,
1750
+ "PRON|Person=3|r-nmod:poss": 661,
1751
+ "PRON|Person=3|r-obj": 662,
1752
+ "PRON|Person=3|r-obl": 663,
1753
+ "PRON|Person=3|r-obl:poss": 664,
1754
+ "PRON|Person=3|r-xcomp": 665,
1755
+ "PRON|PronType=Int|_": 666,
1756
+ "PRON|PronType=Int|l-nsubj": 667,
1757
+ "PRON|PronType=Int|r-obj": 668,
1758
+ "PRON|PronType=Int|r-obl": 669,
1759
+ "PRON|PronType=Int|root": 670,
1760
+ "PRON|PronType=Prs|_": 671,
1761
+ "PRON|PronType=Prs|l-advmod": 672,
1762
+ "PRON|PronType=Prs|l-expl": 673,
1763
+ "PRON|PronType=Prs|l-nsubj": 674,
1764
+ "PRON|PronType=Prs|l-obj": 675,
1765
+ "PRON|PronType=Prs|l-obl": 676,
1766
+ "PRON|PronType=Prs|r-advcl": 677,
1767
+ "PRON|PronType=Prs|r-advmod": 678,
1768
+ "PRON|PronType=Prs|r-ccomp": 679,
1769
+ "PRON|PronType=Prs|r-clf": 680,
1770
+ "PRON|PronType=Prs|r-conj": 681,
1771
+ "PRON|PronType=Prs|r-nmod": 682,
1772
+ "PRON|PronType=Prs|r-nsubj": 683,
1773
+ "PRON|PronType=Prs|r-obj": 684,
1774
+ "PRON|PronType=Prs|r-obl": 685,
1775
+ "PRON|PronType=Prs|root": 686,
1776
+ "PRON|PronType=Rcp|_": 687,
1777
+ "PRON|PronType=Rcp|r-advmod": 688,
1778
+ "PRON|PronType=Rcp|r-iobj": 689,
1779
+ "PRON|PronType=Rcp|r-nmod": 690,
1780
+ "PRON|PronType=Rcp|r-obj": 691,
1781
+ "PRON|PronType=Rcp|r-obl": 692,
1782
+ "PRON|_": 693,
1783
+ "PRON|l-advcl": 694,
1784
+ "PRON|l-advmod": 695,
1785
+ "PRON|l-compound": 696,
1786
+ "PRON|l-csubj": 697,
1787
+ "PRON|l-dislocated": 698,
1788
+ "PRON|l-expl": 699,
1789
+ "PRON|l-iobj": 700,
1790
+ "PRON|l-mark": 701,
1791
+ "PRON|l-nmod": 702,
1792
+ "PRON|l-nsubj": 703,
1793
+ "PRON|l-obj": 704,
1794
+ "PRON|l-obl": 705,
1795
+ "PRON|r-acl": 706,
1796
+ "PRON|r-acl:relcl": 707,
1797
+ "PRON|r-advcl": 708,
1798
+ "PRON|r-advmod": 709,
1799
+ "PRON|r-appos": 710,
1800
+ "PRON|r-ccomp": 711,
1801
+ "PRON|r-compound": 712,
1802
+ "PRON|r-conj": 713,
1803
+ "PRON|r-det": 714,
1804
+ "PRON|r-discourse": 715,
1805
+ "PRON|r-fixed": 716,
1806
+ "PRON|r-flat": 717,
1807
+ "PRON|r-iobj": 718,
1808
+ "PRON|r-nmod": 719,
1809
+ "PRON|r-nmod:poss": 720,
1810
+ "PRON|r-nsubj": 721,
1811
+ "PRON|r-obj": 722,
1812
+ "PRON|r-obl": 723,
1813
+ "PRON|r-obl:poss": 724,
1814
+ "PRON|r-xcomp": 725,
1815
+ "PRON|root": 726,
1816
+ "PROPN": 727,
1817
+ "PROPN.": 728,
1818
+ "PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth|_": 729,
1819
+ "PROPN|Abbr=Yes|Foreign=Yes|NameType=Oth|r-obj": 730,
1820
+ "PROPN|Abbr=Yes|NameType=Com|_": 731,
1821
+ "PROPN|Abbr=Yes|NameType=Com|r-advmod": 732,
1822
+ "PROPN|Abbr=Yes|NameType=Com|r-nmod": 733,
1823
+ "PROPN|Abbr=Yes|_": 734,
1824
+ "PROPN|Abbr=Yes|l-nmod": 735,
1825
+ "PROPN|Abbr=Yes|l-nsubj": 736,
1826
+ "PROPN|Abbr=Yes|r-nmod": 737,
1827
+ "PROPN|Foreign=Yes|NameType=Com|_": 738,
1828
+ "PROPN|Foreign=Yes|NameType=Com|l-nsubj": 739,
1829
+ "PROPN|Foreign=Yes|NameType=Com|r-list": 740,
1830
+ "PROPN|Foreign=Yes|NameType=Com|r-nmod": 741,
1831
+ "PROPN|Foreign=Yes|NameType=Com|r-obl": 742,
1832
+ "PROPN|Foreign=Yes|NameType=Geo|_": 743,
1833
+ "PROPN|Foreign=Yes|NameType=Geo|r-obj": 744,
1834
+ "PROPN|Foreign=Yes|NameType=Geo|r-obl": 745,
1835
+ "PROPN|Foreign=Yes|NameType=Giv|_": 746,
1836
+ "PROPN|Foreign=Yes|NameType=Giv|l-nsubj": 747,
1837
+ "PROPN|Foreign=Yes|NameType=Oth|_": 748,
1838
+ "PROPN|Foreign=Yes|NameType=Oth|r-conj": 749,
1839
+ "PROPN|Foreign=Yes|NameType=Oth|r-flat": 750,
1840
+ "PROPN|Foreign=Yes|NameType=Oth|r-nmod": 751,
1841
+ "PROPN|Foreign=Yes|NameType=Prs|_": 752,
1842
+ "PROPN|Foreign=Yes|NameType=Prs|l-flat": 753,
1843
+ "PROPN|Foreign=Yes|NameType=Prs|l-nsubj": 754,
1844
+ "PROPN|Foreign=Yes|NameType=Prs|r-conj": 755,
1845
+ "PROPN|Foreign=Yes|NameType=Prs|r-flat": 756,
1846
+ "PROPN|Foreign=Yes|NameType=Prs|r-nmod": 757,
1847
+ "PROPN|Foreign=Yes|NameType=Prs|r-obj": 758,
1848
+ "PROPN|Foreign=Yes|NameType=Prs|r-obl": 759,
1849
+ "PROPN|Foreign=Yes|NameType=Sur|_": 760,
1850
+ "PROPN|Foreign=Yes|NameType=Sur|r-flat": 761,
1851
+ "PROPN|Foreign=Yes|_": 762,
1852
+ "PROPN|Foreign=Yes|l-flat": 763,
1853
+ "PROPN|Foreign=Yes|l-nmod": 764,
1854
+ "PROPN|Foreign=Yes|l-nsubj": 765,
1855
+ "PROPN|Foreign=Yes|l-obl": 766,
1856
+ "PROPN|Foreign=Yes|r-appos": 767,
1857
+ "PROPN|Foreign=Yes|r-ccomp": 768,
1858
+ "PROPN|Foreign=Yes|r-compound": 769,
1859
+ "PROPN|Foreign=Yes|r-conj": 770,
1860
+ "PROPN|Foreign=Yes|r-flat": 771,
1861
+ "PROPN|Foreign=Yes|r-iobj": 772,
1862
+ "PROPN|Foreign=Yes|r-list": 773,
1863
+ "PROPN|Foreign=Yes|r-nmod": 774,
1864
+ "PROPN|Foreign=Yes|r-nsubj": 775,
1865
+ "PROPN|Foreign=Yes|r-obj": 776,
1866
+ "PROPN|Foreign=Yes|r-obl": 777,
1867
+ "PROPN|Foreign=Yes|root": 778,
1868
+ "PROPN|NameType=Com|_": 779,
1869
+ "PROPN|NameType=Com|l-nsubj": 780,
1870
+ "PROPN|NameType=Com|l-obl": 781,
1871
+ "PROPN|NameType=Com|r-appos": 782,
1872
+ "PROPN|NameType=Com|r-conj": 783,
1873
+ "PROPN|NameType=Com|r-flat": 784,
1874
+ "PROPN|NameType=Com|r-list": 785,
1875
+ "PROPN|NameType=Com|r-nmod": 786,
1876
+ "PROPN|NameType=Com|r-nsubj": 787,
1877
+ "PROPN|NameType=Com|r-obj": 788,
1878
+ "PROPN|NameType=Com|r-obl": 789,
1879
+ "PROPN|NameType=Geo|_": 790,
1880
+ "PROPN|NameType=Geo|l-nsubj": 791,
1881
+ "PROPN|NameType=Geo|l-obl": 792,
1882
+ "PROPN|NameType=Geo|r-compound": 793,
1883
+ "PROPN|NameType=Geo|r-conj": 794,
1884
+ "PROPN|NameType=Geo|r-flat": 795,
1885
+ "PROPN|NameType=Geo|r-list": 796,
1886
+ "PROPN|NameType=Geo|r-nmod": 797,
1887
+ "PROPN|NameType=Geo|r-nsubj": 798,
1888
+ "PROPN|NameType=Geo|r-nummod": 799,
1889
+ "PROPN|NameType=Geo|r-obj": 800,
1890
+ "PROPN|NameType=Geo|r-obl": 801,
1891
+ "PROPN|NameType=Geo|root": 802,
1892
+ "PROPN|NameType=Giv|_": 803,
1893
+ "PROPN|NameType=Giv|l-dislocated": 804,
1894
+ "PROPN|NameType=Giv|l-nsubj": 805,
1895
+ "PROPN|NameType=Giv|l-obl": 806,
1896
+ "PROPN|NameType=Giv|r-acl": 807,
1897
+ "PROPN|NameType=Giv|r-appos": 808,
1898
+ "PROPN|NameType=Giv|r-ccomp": 809,
1899
+ "PROPN|NameType=Giv|r-conj": 810,
1900
+ "PROPN|NameType=Giv|r-flat": 811,
1901
+ "PROPN|NameType=Giv|r-list": 812,
1902
+ "PROPN|NameType=Giv|r-nmod": 813,
1903
+ "PROPN|NameType=Giv|r-nsubj": 814,
1904
+ "PROPN|NameType=Giv|r-obj": 815,
1905
+ "PROPN|NameType=Giv|r-obl": 816,
1906
+ "PROPN|NameType=Giv|root": 817,
1907
+ "PROPN|NameType=Nat|_": 818,
1908
+ "PROPN|NameType=Nat|l-csubj": 819,
1909
+ "PROPN|NameType=Nat|l-nsubj": 820,
1910
+ "PROPN|NameType=Nat|l-obl": 821,
1911
+ "PROPN|NameType=Nat|r-acl": 822,
1912
+ "PROPN|NameType=Nat|r-appos": 823,
1913
+ "PROPN|NameType=Nat|r-compound": 824,
1914
+ "PROPN|NameType=Nat|r-conj": 825,
1915
+ "PROPN|NameType=Nat|r-flat": 826,
1916
+ "PROPN|NameType=Nat|r-list": 827,
1917
+ "PROPN|NameType=Nat|r-nmod": 828,
1918
+ "PROPN|NameType=Nat|r-nummod": 829,
1919
+ "PROPN|NameType=Nat|r-obj": 830,
1920
+ "PROPN|NameType=Nat|r-obl": 831,
1921
+ "PROPN|NameType=Oth|_": 832,
1922
+ "PROPN|NameType=Oth|l-dislocated": 833,
1923
+ "PROPN|NameType=Oth|l-nsubj": 834,
1924
+ "PROPN|NameType=Oth|r-acl": 835,
1925
+ "PROPN|NameType=Oth|r-appos": 836,
1926
+ "PROPN|NameType=Oth|r-compound": 837,
1927
+ "PROPN|NameType=Oth|r-conj": 838,
1928
+ "PROPN|NameType=Oth|r-flat": 839,
1929
+ "PROPN|NameType=Oth|r-nmod": 840,
1930
+ "PROPN|NameType=Oth|r-obj": 841,
1931
+ "PROPN|NameType=Oth|r-obl": 842,
1932
+ "PROPN|NameType=Oth|root": 843,
1933
+ "PROPN|NameType=Pro|_": 844,
1934
+ "PROPN|NameType=Pro|l-nsubj": 845,
1935
+ "PROPN|NameType=Pro|l-obl": 846,
1936
+ "PROPN|NameType=Pro|r-advcl": 847,
1937
+ "PROPN|NameType=Pro|r-flat": 848,
1938
+ "PROPN|NameType=Pro|r-nmod": 849,
1939
+ "PROPN|NameType=Pro|r-obj": 850,
1940
+ "PROPN|NameType=Prs|_": 851,
1941
+ "PROPN|NameType=Prs|l-dislocated": 852,
1942
+ "PROPN|NameType=Prs|l-nsubj": 853,
1943
+ "PROPN|NameType=Prs|l-obl": 854,
1944
+ "PROPN|NameType=Prs|l-vocative": 855,
1945
+ "PROPN|NameType=Prs|r-conj": 856,
1946
+ "PROPN|NameType=Prs|r-discourse": 857,
1947
+ "PROPN|NameType=Prs|r-flat": 858,
1948
+ "PROPN|NameType=Prs|r-list": 859,
1949
+ "PROPN|NameType=Prs|r-nmod": 860,
1950
+ "PROPN|NameType=Prs|r-obj": 861,
1951
+ "PROPN|NameType=Prs|r-obl": 862,
1952
+ "PROPN|NameType=Prs|r-vocative": 863,
1953
+ "PROPN|NameType=Sur|_": 864,
1954
+ "PROPN|NameType=Sur|l-nsubj": 865,
1955
+ "PROPN|NameType=Sur|r-flat": 866,
1956
+ "PROPN|NameType=Sur|r-nmod": 867,
1957
+ "PROPN|NounType=Class|_": 868,
1958
+ "PROPN|NounType=Class|r-clf": 869,
1959
+ "PROPN|Prefix=Yes|_": 870,
1960
+ "PROPN|Prefix=Yes|l-nsubj": 871,
1961
+ "PROPN|Prefix=Yes|r-nmod": 872,
1962
+ "PROPN|_": 873,
1963
+ "PROPN|l-advmod": 874,
1964
+ "PROPN|l-aux": 875,
1965
+ "PROPN|l-compound": 876,
1966
+ "PROPN|l-nsubj": 877,
1967
+ "PROPN|l-nsubj:pass": 878,
1968
+ "PROPN|l-obj": 879,
1969
+ "PROPN|l-obl": 880,
1970
+ "PROPN|l-obl:tmod": 881,
1971
+ "PROPN|r-acl": 882,
1972
+ "PROPN|r-acl:relcl": 883,
1973
+ "PROPN|r-advmod": 884,
1974
+ "PROPN|r-appos": 885,
1975
+ "PROPN|r-cc": 886,
1976
+ "PROPN|r-ccomp": 887,
1977
+ "PROPN|r-clf": 888,
1978
+ "PROPN|r-compound": 889,
1979
+ "PROPN|r-conj": 890,
1980
+ "PROPN|r-fixed": 891,
1981
+ "PROPN|r-flat": 892,
1982
+ "PROPN|r-flat:name": 893,
1983
+ "PROPN|r-goeswith": 894,
1984
+ "PROPN|r-iobj": 895,
1985
+ "PROPN|r-list": 896,
1986
+ "PROPN|r-nmod": 897,
1987
+ "PROPN|r-nmod:poss": 898,
1988
+ "PROPN|r-obj": 899,
1989
+ "PROPN|r-obl": 900,
1990
+ "PROPN|r-obl:poss": 901,
1991
+ "PROPN|r-obl:tmod": 902,
1992
+ "PROPN|r-xcomp": 903,
1993
+ "PROPN|root": 904,
1994
+ "PUNCT": 905,
1995
+ "PUNCT.": 906,
1996
+ "PUNCT|NounType=Class|_": 907,
1997
+ "PUNCT|NounType=Class|r-punct": 908,
1998
+ "PUNCT|_": 909,
1999
+ "PUNCT|l-advmod": 910,
2000
+ "PUNCT|l-dep": 911,
2001
+ "PUNCT|l-punct": 912,
2002
+ "PUNCT|r-advmod": 913,
2003
+ "PUNCT|r-clf": 914,
2004
+ "PUNCT|r-dep": 915,
2005
+ "PUNCT|r-punct": 916,
2006
+ "PUNCT|root": 917,
2007
+ "SCONJ": 918,
2008
+ "SCONJ.": 919,
2009
+ "SCONJ|NumType=Mult|_": 920,
2010
+ "SCONJ|NumType=Mult|l-mark": 921,
2011
+ "SCONJ|Prefix=Yes|_": 922,
2012
+ "SCONJ|Prefix=Yes|l-cc": 923,
2013
+ "SCONJ|Prefix=Yes|l-mark": 924,
2014
+ "SCONJ|VerbType=Cop|_": 925,
2015
+ "SCONJ|VerbType=Cop|l-mark": 926,
2016
+ "SCONJ|_": 927,
2017
+ "SCONJ|l-advmod": 928,
2018
+ "SCONJ|l-case": 929,
2019
+ "SCONJ|l-cc": 930,
2020
+ "SCONJ|l-discourse": 931,
2021
+ "SCONJ|l-mark": 932,
2022
+ "SCONJ|l-nsubj": 933,
2023
+ "SCONJ|l-orphan": 934,
2024
+ "SCONJ|r-advcl": 935,
2025
+ "SCONJ|r-compound": 936,
2026
+ "SCONJ|r-fixed": 937,
2027
+ "SCONJ|r-flat": 938,
2028
+ "SCONJ|r-mark": 939,
2029
+ "SCONJ|r-orphan": 940,
2030
+ "SCONJ|root": 941,
2031
+ "SYM": 942,
2032
+ "SYM.": 943,
2033
+ "SYM|_": 944,
2034
+ "SYM|l-dep": 945,
2035
+ "SYM|l-nsubj": 946,
2036
+ "SYM|r-advmod": 947,
2037
+ "SYM|r-clf": 948,
2038
+ "SYM|r-nmod": 949,
2039
+ "SYM|r-obj": 950,
2040
+ "SYM|r-obl": 951,
2041
+ "SYM|r-xcomp": 952,
2042
+ "VERB": 953,
2043
+ "VERB.": 954,
2044
+ "VERB|Abbr=Yes|_": 955,
2045
+ "VERB|Abbr=Yes|r-acl": 956,
2046
+ "VERB|Foreign=Yes|_": 957,
2047
+ "VERB|Foreign=Yes|l-nsubj": 958,
2048
+ "VERB|Foreign=Yes|r-acl": 959,
2049
+ "VERB|Foreign=Yes|r-advcl": 960,
2050
+ "VERB|Foreign=Yes|r-ccomp": 961,
2051
+ "VERB|Foreign=Yes|r-compound": 962,
2052
+ "VERB|Foreign=Yes|r-conj": 963,
2053
+ "VERB|Foreign=Yes|r-flat": 964,
2054
+ "VERB|Foreign=Yes|r-nmod": 965,
2055
+ "VERB|Foreign=Yes|r-xcomp": 966,
2056
+ "VERB|Foreign=Yes|root": 967,
2057
+ "VERB|Mood=Imp|_": 968,
2058
+ "VERB|Mood=Imp|r-xcomp": 969,
2059
+ "VERB|NounType=Class|_": 970,
2060
+ "VERB|NounType=Class|r-acl": 971,
2061
+ "VERB|NounType=Class|r-compound": 972,
2062
+ "VERB|PartType=Adj|_": 973,
2063
+ "VERB|PartType=Adj|r-acl": 974,
2064
+ "VERB|Prefix=Yes|_": 975,
2065
+ "VERB|Prefix=Yes|l-acl": 976,
2066
+ "VERB|Prefix=Yes|l-nsubj": 977,
2067
+ "VERB|Prefix=Yes|r-acl": 978,
2068
+ "VERB|Prefix=Yes|r-advcl": 979,
2069
+ "VERB|Prefix=Yes|r-ccomp": 980,
2070
+ "VERB|Prefix=Yes|r-compound": 981,
2071
+ "VERB|Prefix=Yes|r-conj": 982,
2072
+ "VERB|Prefix=Yes|r-parataxis": 983,
2073
+ "VERB|Prefix=Yes|root": 984,
2074
+ "VERB|VerbType=Cop|_": 985,
2075
+ "VERB|VerbType=Cop|l-advmod": 986,
2076
+ "VERB|VerbType=Cop|l-cop": 987,
2077
+ "VERB|VerbType=Cop|r-acl": 988,
2078
+ "VERB|VerbType=Cop|r-advcl": 989,
2079
+ "VERB|VerbType=Cop|r-ccomp": 990,
2080
+ "VERB|VerbType=Cop|r-compound": 991,
2081
+ "VERB|VerbType=Cop|r-parataxis": 992,
2082
+ "VERB|VerbType=Cop|root": 993,
2083
+ "VERB|_": 994,
2084
+ "VERB|l-acl": 995,
2085
+ "VERB|l-acl:relcl": 996,
2086
+ "VERB|l-advcl": 997,
2087
+ "VERB|l-advmod": 998,
2088
+ "VERB|l-case": 999,
2089
+ "VERB|l-cc": 1000,
2090
+ "VERB|l-ccomp": 1001,
2091
+ "VERB|l-compound": 1002,
2092
+ "VERB|l-conj": 1003,
2093
+ "VERB|l-cop": 1004,
2094
+ "VERB|l-csubj": 1005,
2095
+ "VERB|l-discourse": 1006,
2096
+ "VERB|l-dislocated": 1007,
2097
+ "VERB|l-mark": 1008,
2098
+ "VERB|l-nmod": 1009,
2099
+ "VERB|l-nsubj": 1010,
2100
+ "VERB|l-obj": 1011,
2101
+ "VERB|l-obl": 1012,
2102
+ "VERB|l-orphan": 1013,
2103
+ "VERB|l-xcomp": 1014,
2104
+ "VERB|r-acl": 1015,
2105
+ "VERB|r-acl:relcl": 1016,
2106
+ "VERB|r-advcl": 1017,
2107
+ "VERB|r-advmod": 1018,
2108
+ "VERB|r-appos": 1019,
2109
+ "VERB|r-case": 1020,
2110
+ "VERB|r-cc": 1021,
2111
+ "VERB|r-ccomp": 1022,
2112
+ "VERB|r-clf": 1023,
2113
+ "VERB|r-compound": 1024,
2114
+ "VERB|r-conj": 1025,
2115
+ "VERB|r-dep": 1026,
2116
+ "VERB|r-det": 1027,
2117
+ "VERB|r-discourse": 1028,
2118
+ "VERB|r-fixed": 1029,
2119
+ "VERB|r-flat": 1030,
2120
+ "VERB|r-list": 1031,
2121
+ "VERB|r-mark": 1032,
2122
+ "VERB|r-nmod": 1033,
2123
+ "VERB|r-nmod:poss": 1034,
2124
+ "VERB|r-nsubj": 1035,
2125
+ "VERB|r-obj": 1036,
2126
+ "VERB|r-obl": 1037,
2127
+ "VERB|r-obl:poss": 1038,
2128
+ "VERB|r-orphan": 1039,
2129
+ "VERB|r-parataxis": 1040,
2130
+ "VERB|r-punct": 1041,
2131
+ "VERB|r-xcomp": 1042,
2132
+ "VERB|root": 1043
2133
+ },
2134
+ "layer_norm_eps": 1e-05,
2135
+ "local_attention": 128,
2136
+ "local_rope_theta": 10000.0,
2137
+ "max_position_embeddings": 8192,
2138
+ "mlp_bias": false,
2139
+ "mlp_dropout": 0.0,
2140
+ "model_type": "modernbert",
2141
+ "norm_bias": false,
2142
+ "norm_eps": 1e-05,
2143
+ "num_attention_heads": 16,
2144
+ "num_hidden_layers": 28,
2145
+ "pad_token_id": 1,
2146
+ "position_embedding_type": "absolute",
2147
+ "reference_compile": true,
2148
+ "repad_logits_with_grad": false,
2149
+ "sep_token_id": 2,
2150
+ "sparse_pred_ignore_index": -100,
2151
+ "sparse_prediction": false,
2152
+ "tokenizer_class": "DebertaV2TokenizerFast",
2153
+ "torch_dtype": "float32",
2154
+ "transformers_version": "4.49.0.dev0",
2155
+ "vocab_size": 2803
2156
+ }
configuration_modernbert.py ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
2
+ # This file was automatically generated from src/transformers/models/modernbert/modular_modernbert.py.
3
+ # Do NOT edit this file manually as any edits will be overwritten by the generation of
4
+ # the file from the modular. If any change should be done, please apply the change to the
5
+ # modular_modernbert.py file directly. One of our CI enforces this.
6
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
7
+ # Copyright 2024 Answer.AI, LightOn, and contributors, and the HuggingFace Inc. team. All rights reserved.
8
+ #
9
+ #
10
+ # Licensed under the Apache License, Version 2.0 (the "License");
11
+ # you may not use this file except in compliance with the License.
12
+ # You may obtain a copy of the License at
13
+ #
14
+ # http://www.apache.org/licenses/LICENSE-2.0
15
+ #
16
+ # Unless required by applicable law or agreed to in writing, software
17
+ # distributed under the License is distributed on an "AS IS" BASIS,
18
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
19
+ # See the License for the specific language governing permissions and
20
+ # limitations under the License.
21
+
22
+ from typing import Literal
23
+
24
+ from transformers.configuration_utils import PretrainedConfig
25
+
26
+
27
+ class ModernBertConfig(PretrainedConfig):
28
+ r"""
29
+ This is the configuration class to store the configuration of a [`ModernBertModel`]. It is used to instantiate an ModernBert
30
+ model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
31
+ defaults will yield a similar configuration to that of the ModernBERT-base.
32
+ e.g. [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
33
+
34
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
35
+ documentation from [`PretrainedConfig`] for more information.
36
+
37
+ Args:
38
+ vocab_size (`int`, *optional*, defaults to 50368):
39
+ Vocabulary size of the ModernBert model. Defines the number of different tokens that can be represented by the
40
+ `inputs_ids` passed when calling [`ModernBertModel`]
41
+ hidden_size (`int`, *optional*, defaults to 768):
42
+ Dimension of the hidden representations.
43
+ intermediate_size (`int`, *optional*, defaults to 1152):
44
+ Dimension of the MLP representations.
45
+ num_hidden_layers (`int`, *optional*, defaults to 22):
46
+ Number of hidden layers in the Transformer decoder.
47
+ num_attention_heads (`int`, *optional*, defaults to 12):
48
+ Number of attention heads for each attention layer in the Transformer decoder.
49
+ hidden_activation (`str` or `function`, *optional*, defaults to `"gelu"`):
50
+ The non-linear activation function (function or string) in the decoder. Will default to `"gelu"`
51
+ if not specified.
52
+ max_position_embeddings (`int`, *optional*, defaults to 8192):
53
+ The maximum sequence length that this model might ever be used with.
54
+ initializer_range (`float`, *optional*, defaults to 0.02):
55
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
56
+ initializer_cutoff_factor (`float`, *optional*, defaults to 2.0):
57
+ The cutoff factor for the truncated_normal_initializer for initializing all weight matrices.
58
+ norm_eps (`float`, *optional*, defaults to 1e-05):
59
+ The epsilon used by the rms normalization layers.
60
+ norm_bias (`bool`, *optional*, defaults to `False`):
61
+ Whether to use bias in the normalization layers.
62
+ pad_token_id (`int`, *optional*, defaults to 50283):
63
+ Padding token id.
64
+ eos_token_id (`int`, *optional*, defaults to 50282):
65
+ End of stream token id.
66
+ bos_token_id (`int`, *optional*, defaults to 50281):
67
+ Beginning of stream token id.
68
+ cls_token_id (`int`, *optional*, defaults to 50281):
69
+ Classification token id.
70
+ sep_token_id (`int`, *optional*, defaults to 50282):
71
+ Separation token id.
72
+ global_rope_theta (`float`, *optional*, defaults to 160000.0):
73
+ The base period of the global RoPE embeddings.
74
+ attention_bias (`bool`, *optional*, defaults to `False`):
75
+ Whether to use a bias in the query, key, value and output projection layers during self-attention.
76
+ attention_dropout (`float`, *optional*, defaults to 0.0):
77
+ The dropout ratio for the attention probabilities.
78
+ global_attn_every_n_layers (`int`, *optional*, defaults to 3):
79
+ The number of layers between global attention layers.
80
+ local_attention (`int`, *optional*, defaults to 128):
81
+ The window size for local attention.
82
+ local_rope_theta (`float`, *optional*, defaults to 10000.0):
83
+ The base period of the local RoPE embeddings.
84
+ embedding_dropout (`float`, *optional*, defaults to 0.0):
85
+ The dropout ratio for the embeddings.
86
+ mlp_bias (`bool`, *optional*, defaults to `False`):
87
+ Whether to use bias in the MLP layers.
88
+ mlp_dropout (`float`, *optional*, defaults to 0.0):
89
+ The dropout ratio for the MLP layers.
90
+ decoder_bias (`bool`, *optional*, defaults to `True`):
91
+ Whether to use bias in the decoder layers.
92
+ classifier_pooling (`str`, *optional*, defaults to `"cls"`):
93
+ The pooling method for the classifier. Should be either `"cls"` or `"mean"`. In local attention layers, the
94
+ CLS token doesn't attend to all tokens on long sequences.
95
+ classifier_dropout (`float`, *optional*, defaults to 0.0):
96
+ The dropout ratio for the classifier.
97
+ classifier_bias (`bool`, *optional*, defaults to `False`):
98
+ Whether to use bias in the classifier.
99
+ classifier_activation (`str`, *optional*, defaults to `"gelu"`):
100
+ The activation function for the classifier.
101
+ deterministic_flash_attn (`bool`, *optional*, defaults to `False`):
102
+ Whether to use deterministic flash attention. If `False`, inference will be faster but not deterministic.
103
+ sparse_prediction (`bool`, *optional*, defaults to `False`):
104
+ Whether to use sparse prediction for the masked language model instead of returning the full dense logits.
105
+ sparse_pred_ignore_index (`int`, *optional*, defaults to -100):
106
+ The index to ignore for the sparse prediction.
107
+ reference_compile (`bool`, *optional*):
108
+ Whether to compile the layers of the model which were compiled during pretraining. If `None`, then parts of
109
+ the model will be compiled if 1) `triton` is installed, 2) the model is not on MPS, 3) the model is not
110
+ shared between devices, and 4) the model is not resized after initialization. If `True`, then the model may
111
+ be faster in some scenarios.
112
+
113
+ Examples:
114
+
115
+ ```python
116
+ >>> from transformers import ModernBertModel, ModernBertConfig
117
+
118
+ >>> # Initializing a ModernBert style configuration
119
+ >>> configuration = ModernBertConfig()
120
+
121
+ >>> # Initializing a model from the modernbert-base style configuration
122
+ >>> model = ModernBertModel(configuration)
123
+
124
+ >>> # Accessing the model configuration
125
+ >>> configuration = model.config
126
+ ```"""
127
+
128
+ model_type = "modernbert"
129
+ keys_to_ignore_at_inference = ["past_key_values"]
130
+
131
+ def __init__(
132
+ self,
133
+ vocab_size=50368,
134
+ hidden_size=768,
135
+ intermediate_size=1152,
136
+ num_hidden_layers=22,
137
+ num_attention_heads=12,
138
+ hidden_activation="gelu",
139
+ max_position_embeddings=8192,
140
+ initializer_range=0.02,
141
+ initializer_cutoff_factor=2.0,
142
+ norm_eps=1e-5,
143
+ norm_bias=False,
144
+ pad_token_id=50283,
145
+ eos_token_id=50282,
146
+ bos_token_id=50281,
147
+ cls_token_id=50281,
148
+ sep_token_id=50282,
149
+ global_rope_theta=160000.0,
150
+ attention_bias=False,
151
+ attention_dropout=0.0,
152
+ global_attn_every_n_layers=3,
153
+ local_attention=128,
154
+ local_rope_theta=10000.0,
155
+ embedding_dropout=0.0,
156
+ mlp_bias=False,
157
+ mlp_dropout=0.0,
158
+ decoder_bias=True,
159
+ classifier_pooling: Literal["cls", "mean"] = "cls",
160
+ classifier_dropout=0.0,
161
+ classifier_bias=False,
162
+ classifier_activation="gelu",
163
+ deterministic_flash_attn=False,
164
+ sparse_prediction=False,
165
+ sparse_pred_ignore_index=-100,
166
+ reference_compile=None,
167
+ **kwargs,
168
+ ):
169
+ super().__init__(
170
+ pad_token_id=pad_token_id,
171
+ bos_token_id=bos_token_id,
172
+ eos_token_id=eos_token_id,
173
+ cls_token_id=cls_token_id,
174
+ sep_token_id=sep_token_id,
175
+ **kwargs,
176
+ )
177
+ self.vocab_size = vocab_size
178
+ self.max_position_embeddings = max_position_embeddings
179
+ self.hidden_size = hidden_size
180
+ self.intermediate_size = intermediate_size
181
+ self.num_hidden_layers = num_hidden_layers
182
+ self.num_attention_heads = num_attention_heads
183
+ self.initializer_range = initializer_range
184
+ self.initializer_cutoff_factor = initializer_cutoff_factor
185
+ self.norm_eps = norm_eps
186
+ self.norm_bias = norm_bias
187
+ self.global_rope_theta = global_rope_theta
188
+ self.attention_bias = attention_bias
189
+ self.attention_dropout = attention_dropout
190
+ self.hidden_activation = hidden_activation
191
+ self.global_attn_every_n_layers = global_attn_every_n_layers
192
+ self.local_attention = local_attention
193
+ self.local_rope_theta = local_rope_theta
194
+ self.embedding_dropout = embedding_dropout
195
+ self.mlp_bias = mlp_bias
196
+ self.mlp_dropout = mlp_dropout
197
+ self.decoder_bias = decoder_bias
198
+ self.classifier_pooling = classifier_pooling
199
+ self.classifier_dropout = classifier_dropout
200
+ self.classifier_bias = classifier_bias
201
+ self.classifier_activation = classifier_activation
202
+ self.deterministic_flash_attn = deterministic_flash_attn
203
+ self.sparse_prediction = sparse_prediction
204
+ self.sparse_pred_ignore_index = sparse_pred_ignore_index
205
+ self.reference_compile = reference_compile
206
+
207
+ if self.classifier_pooling not in ["cls", "mean"]:
208
+ raise ValueError(
209
+ f'Invalid value for `classifier_pooling`, should be either "cls" or "mean", but is {self.classifier_pooling}.'
210
+ )
211
+
212
+
213
+ __all__ = ["ModernBertConfig"]
maker.py ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #! /usr/bin/python3
2
+ src="KoichiYasuoka/modernbert-large-thai-wikipedia-upos"
3
+ tgt="KoichiYasuoka/modernbert-large-thai-wikipedia-ud-embeds"
4
+ import os
5
+ os.system("""D=spaCy-Thai/UD_Thai-Corpora
6
+ test -d $D || git clone --depth=1 https://github.com/KoichiYasuoka/spaCy-Thai
7
+ nawk 'BEGIN{FS=OFS="\\t"}
8
+ {if(NF==10&&$1~/^[1-9][0-9]*$/||$0~/^# text =/)u=u$0"\\n";
9
+ else if($0==""){f=(FILENAME~/test/)?"test":(FILENAME~/dev/)?"dev":"train";
10
+ if(u~/\\t0\\troot\\t/)print u>f".conllu";
11
+ u=""}
12
+ }' $D/*-ud-*.conllu""")
13
+ class UDEmbedsDataset(object):
14
+ def __init__(self,conllu,tokenizer,embeddings=None):
15
+ self.conllu=open(conllu,"r",encoding="utf-8")
16
+ self.tokenizer=tokenizer
17
+ self.embeddings=embeddings
18
+ self.seeks=[0]
19
+ label=set(["SYM","SYM."])
20
+ dep=set()
21
+ s=self.conllu.readline()
22
+ while s!="":
23
+ if s=="\n":
24
+ self.seeks.append(self.conllu.tell())
25
+ else:
26
+ w=s.split("\t")
27
+ if len(w)==10:
28
+ if w[0].isdecimal():
29
+ p=w[3]
30
+ q="" if w[5]=="_" else "|"+w[5]
31
+ d=("|" if w[6]=="0" else "|l-" if int(w[0])<int(w[6]) else "|r-")+w[7]
32
+ for k in [p,p+".","B-"+p,"B-"+p+".","I-"+p,"I-"+p+".",p+q+"|_",p+q+d]:
33
+ label.add(k)
34
+ s=self.conllu.readline()
35
+ self.label2id={l:i for i,l in enumerate(sorted(label))}
36
+ def __call__(*args):
37
+ lid={l:i for i,l in enumerate(sorted(set(sum([list(t.label2id) for t in args],[]))))}
38
+ for t in args:
39
+ t.label2id=lid
40
+ return lid
41
+ def __del__(self):
42
+ self.conllu.close()
43
+ __len__=lambda self:(len(self.seeks)-1)*2
44
+ def __getitem__(self,i):
45
+ self.conllu.seek(self.seeks[int(i/2)])
46
+ z,c,t,s=i%2,[],[""],False
47
+ while t[0]!="\n":
48
+ t=self.conllu.readline().split("\t")
49
+ if len(t)==10 and t[0].isdecimal():
50
+ if s:
51
+ t[1]=" "+t[1]
52
+ c.append(t)
53
+ s=t[9].find("SpaceAfter=No")<0
54
+ x=[True if t[6]=="0" or int(t[6])>j or sum([1 if int(c[i][6])==j+1 else 0 for i in range(j+1,len(c))])>0 else False for j,t in enumerate(c)]
55
+ v=self.tokenizer([t[1] for t in c],add_special_tokens=False)["input_ids"]
56
+ if z==0:
57
+ ids,upos=[self.tokenizer.cls_token_id],["SYM."]
58
+ for i,(j,k) in enumerate(zip(v,c)):
59
+ if j==[]:
60
+ j=[self.tokenizer.unk_token_id]
61
+ p=k[3] if x[i] else k[3]+"."
62
+ ids+=j
63
+ upos+=[p] if len(j)==1 else ["B-"+p]+["I-"+p]*(len(j)-1)
64
+ ids.append(self.tokenizer.sep_token_id)
65
+ upos.append("SYM.")
66
+ emb=self.embeddings
67
+ else:
68
+ import torch
69
+ if len(x)<128:
70
+ x=[True]*len(x)
71
+ else:
72
+ w=sum([len(x)-i+1 if b else 0 for i,b in enumerate(x)])+1
73
+ for i in range(len(x)):
74
+ if x[i]==False and w+len(x)-i<8192:
75
+ x[i]=True
76
+ w+=len(x)-i+1
77
+ p=[t[3] if t[5]=="_" else t[3]+"|"+t[5] for i,t in enumerate(c)]
78
+ d=[t[7] if t[6]=="0" else "l-"+t[7] if int(t[0])<int(t[6]) else "r-"+t[7] for t in c]
79
+ ids,upos=[-1],["SYM|_"]
80
+ for i in range(len(x)):
81
+ if x[i]:
82
+ ids.append(i)
83
+ upos.append(p[i]+"|"+d[i] if c[i][6]=="0" else p[i]+"|_")
84
+ for j in range(i+1,len(x)):
85
+ ids.append(j)
86
+ upos.append(p[j]+"|"+d[j] if int(c[j][6])==i+1 else p[i]+"|"+d[i] if int(c[i][6])==j+1 else p[j]+"|_")
87
+ ids.append(-1)
88
+ upos.append("SYM|_")
89
+ with torch.no_grad():
90
+ m=[]
91
+ for j in v:
92
+ if j==[]:
93
+ j=[self.tokenizer.unk_token_id]
94
+ m.append(self.embeddings[j,:].sum(axis=0))
95
+ m.append(self.embeddings[self.tokenizer.sep_token_id,:])
96
+ emb=torch.stack(m)
97
+ return{"inputs_embeds":emb[ids[:8192],:],"labels":[self.label2id[p] for p in upos[:8192]]}
98
+ from transformers import AutoTokenizer,AutoConfig,AutoModelForTokenClassification,DefaultDataCollator,TrainingArguments,Trainer
99
+ tkz=AutoTokenizer.from_pretrained(src)
100
+ trainDS=UDEmbedsDataset("train.conllu",tkz)
101
+ devDS=UDEmbedsDataset("dev.conllu",tkz)
102
+ testDS=UDEmbedsDataset("test.conllu",tkz)
103
+ lid=trainDS(devDS,testDS)
104
+ cfg=AutoConfig.from_pretrained(src,num_labels=len(lid),label2id=lid,id2label={i:l for l,i in lid.items()},ignore_mismatched_sizes=True,trust_remote_code=True)
105
+ mdl=AutoModelForTokenClassification.from_pretrained(src,config=cfg,ignore_mismatched_sizes=True,trust_remote_code=True)
106
+ trainDS.embeddings=mdl.get_input_embeddings().weight
107
+ arg=TrainingArguments(num_train_epochs=3,per_device_train_batch_size=1,dataloader_pin_memory=False,output_dir=tgt,overwrite_output_dir=True,save_total_limit=2,learning_rate=5e-05,warmup_ratio=0.1,save_safetensors=False)
108
+ trn=Trainer(args=arg,data_collator=DefaultDataCollator(),model=mdl,train_dataset=trainDS)
109
+ trn.train()
110
+ trn.save_model(tgt)
111
+ tkz.save_pretrained(tgt)
modeling_modernbert.py ADDED
@@ -0,0 +1,1351 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
2
+ # This file was automatically generated from src/transformers/models/modernbert/modular_modernbert.py.
3
+ # Do NOT edit this file manually as any edits will be overwritten by the generation of
4
+ # the file from the modular. If any change should be done, please apply the change to the
5
+ # modular_modernbert.py file directly. One of our CI enforces this.
6
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
7
+ # Copyright 2024 Answer.AI, LightOn, and contributors, and the HuggingFace Inc. team. All rights reserved.
8
+ #
9
+ #
10
+ # Licensed under the Apache License, Version 2.0 (the "License");
11
+ # you may not use this file except in compliance with the License.
12
+ # You may obtain a copy of the License at
13
+ #
14
+ # http://www.apache.org/licenses/LICENSE-2.0
15
+ #
16
+ # Unless required by applicable law or agreed to in writing, software
17
+ # distributed under the License is distributed on an "AS IS" BASIS,
18
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
19
+ # See the License for the specific language governing permissions and
20
+ # limitations under the License.
21
+
22
+ import math
23
+ from typing import Dict, Optional, Tuple, Union
24
+
25
+ import torch
26
+ import torch.nn.functional as F
27
+ from torch import nn
28
+ from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
29
+
30
+ from transformers.activations import ACT2FN
31
+ from transformers.modeling_attn_mask_utils import _prepare_4d_attention_mask
32
+ from transformers.modeling_outputs import BaseModelOutput, MaskedLMOutput, SequenceClassifierOutput, TokenClassifierOutput
33
+ from transformers.modeling_utils import PreTrainedModel
34
+ from transformers.utils import (
35
+ add_code_sample_docstrings,
36
+ add_start_docstrings,
37
+ add_start_docstrings_to_model_forward,
38
+ is_flash_attn_2_available,
39
+ logging,
40
+ )
41
+ import importlib
42
+ is_triton_available = lambda: importlib.util.find_spec("triton") is not None
43
+ from .configuration_modernbert import ModernBertConfig
44
+
45
+
46
+ if is_flash_attn_2_available():
47
+ from flash_attn.flash_attn_interface import flash_attn_varlen_qkvpacked_func
48
+ from flash_attn.layers.rotary import RotaryEmbedding
49
+ from flash_attn.ops.triton.rotary import apply_rotary
50
+ else:
51
+ RotaryEmbedding = object
52
+
53
+ logger = logging.get_logger(__name__)
54
+
55
+ _CHECKPOINT_FOR_DOC = "answerdotai/ModernBERT-base"
56
+ _CONFIG_FOR_DOC = "ModernBertConfig"
57
+
58
+
59
+ class ApplyRotaryEmbUnpad(torch.autograd.Function):
60
+ @staticmethod
61
+ def forward(
62
+ ctx,
63
+ qkv,
64
+ cos,
65
+ sin,
66
+ cu_seqlens: Optional[torch.Tensor] = None,
67
+ max_seqlen: Optional[int] = None,
68
+ ):
69
+ # (total_nnz, 3, nheads, headdim)
70
+ qkv = qkv.contiguous()
71
+ total_nnz, _three, _nheads, headdim = qkv.shape
72
+ # We need qkv to be contiguous so that when we reshape to combine (3, nheads) dimensions,
73
+ # we get the same tensor
74
+ # qk = rearrange(qkv[:, :2], "b_s t h d -> b_s (t h) d")
75
+ qk = qkv[:, :2].view(total_nnz, -1, headdim)
76
+ apply_rotary(
77
+ qk,
78
+ cos,
79
+ sin,
80
+ seqlen_offsets=0,
81
+ cu_seqlens=cu_seqlens,
82
+ max_seqlen=max_seqlen,
83
+ interleaved=False,
84
+ inplace=True,
85
+ )
86
+
87
+ ctx.save_for_backward(cos, sin, cu_seqlens)
88
+ ctx.max_seqlen = max_seqlen
89
+ return qkv
90
+
91
+ @staticmethod
92
+ def backward(ctx, do):
93
+ cos, sin, cu_seqlens = ctx.saved_tensors
94
+ do = do.contiguous()
95
+ total_nnz, _three, _nheads, headdim = do.shape
96
+ # We need dqkv to be contiguous so that when we reshape to combine (3, nheads) dimensions,
97
+ # we get the same tensor
98
+ dqk = do[:, :2].view(total_nnz, -1, headdim)
99
+ apply_rotary(
100
+ dqk,
101
+ cos,
102
+ sin,
103
+ seqlen_offsets=0,
104
+ cu_seqlens=cu_seqlens,
105
+ max_seqlen=ctx.max_seqlen,
106
+ interleaved=False,
107
+ inplace=True,
108
+ conjugate=True,
109
+ )
110
+
111
+ return do, None, None, None, None, None, None
112
+
113
+
114
+ def apply_rotary_unpadded(
115
+ qkv,
116
+ cos,
117
+ sin,
118
+ cu_seqlens: Optional[torch.Tensor] = None,
119
+ max_seqlen: Optional[int] = None,
120
+ ):
121
+ """
122
+ Arguments:
123
+ qkv: (total_nnz, 3, nheads, headdim) - input tensor for packed QKV.
124
+ cos, sin: (seqlen_rotary, rotary_dim / 2)
125
+ interleaved: if True, rotate pairs of even and odd dimensions (GPT-J style) instead
126
+ of 1st half and 2nd half (GPT-NeoX style).
127
+ inplace: if True, apply rotary embedding in-place.
128
+ seqlen_offsets: (batch_size,) or int. Each sequence in x is shifted by this amount.
129
+ Most commonly used in inference when we have KV cache.
130
+ cu_seqlens: (batch + 1,) or None
131
+ max_seqlen: int
132
+ Return:
133
+ out: (total_nnz, dim)
134
+ rotary_dim must be <= headdim
135
+ Apply rotary embedding to the first rotary_dim of x.
136
+ """
137
+ return ApplyRotaryEmbUnpad.apply(qkv, cos, sin, cu_seqlens, max_seqlen)
138
+
139
+
140
+ class ModernBertUnpaddedRotaryEmbedding(RotaryEmbedding):
141
+ """
142
+ The rotary position embeddings applied directly to unpadded sequences.
143
+ """
144
+
145
+ def __init__(
146
+ self,
147
+ dim: int,
148
+ base: float = 10000.0,
149
+ max_seqlen: Optional[int] = None,
150
+ device: Optional[torch.device] = None,
151
+ dtype: Optional[torch.dtype] = None,
152
+ ):
153
+ """
154
+ max_seqlen: if max_seqlen, device, and dtype are provided, we precompute the cos_sin_cache
155
+ up to max_seqlen. If the max_seqlen, device, or dtype during training/inference differ,
156
+ the cos_sin_cache wll be recomputed during the forward pass.
157
+ """
158
+ super().__init__(dim=dim, base=base, pos_idx_in_fp32=True, device=device, interleaved=False)
159
+ self.max_seqlen = max_seqlen
160
+
161
+ if max_seqlen is not None and device is not None and dtype is not None:
162
+ self._update_cos_sin_cache(max_seqlen, device=device, dtype=dtype)
163
+
164
+ def forward(
165
+ self,
166
+ qkv: torch.Tensor,
167
+ cu_seqlens: torch.Tensor,
168
+ max_seqlen: Optional[int] = None,
169
+ ) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
170
+ """
171
+ Apply rotary embedding *inplace* to qkv.
172
+ qkv: (total_nnz, 3, nheads, headdim)
173
+ cu_seqlens: (batch + 1,) cumulative sequence lengths
174
+ max_seqlen: int max seq length in the batch
175
+ """
176
+ if max_seqlen is not None:
177
+ self._update_cos_sin_cache(max_seqlen, device=qkv.device, dtype=qkv.dtype)
178
+
179
+ qkv = apply_rotary_unpadded(
180
+ qkv,
181
+ self._cos_cached,
182
+ self._sin_cached,
183
+ cu_seqlens=cu_seqlens,
184
+ max_seqlen=max_seqlen,
185
+ )
186
+
187
+ return qkv
188
+
189
+ def extra_repr(self) -> str:
190
+ return f"dim={self.dim}, base={self.base}, scale_base={self.scale_base}"
191
+
192
+
193
+ class ModernBertEmbeddings(nn.Module):
194
+ """
195
+ Same as BertEmbeddings with a tiny tweak for positional embeddings indexing.
196
+ """
197
+
198
+ def __init__(self, config: ModernBertConfig):
199
+ super().__init__()
200
+ self.config = config
201
+ self.tok_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)
202
+ self.norm = nn.LayerNorm(config.hidden_size, eps=config.norm_eps, bias=config.norm_bias)
203
+ self.drop = nn.Dropout(config.embedding_dropout)
204
+
205
+ @torch.compile(dynamic=True)
206
+ def compiled_embeddings(self, input_ids: torch.LongTensor) -> torch.Tensor:
207
+ return self.drop(self.norm(self.tok_embeddings(input_ids)))
208
+
209
+ def forward(
210
+ self, input_ids: torch.LongTensor = None, inputs_embeds: Optional[torch.Tensor] = None
211
+ ) -> torch.Tensor:
212
+ if inputs_embeds is not None:
213
+ hidden_states = self.drop(self.norm(inputs_embeds))
214
+ else:
215
+ hidden_states = (
216
+ self.compiled_embeddings(input_ids)
217
+ if self.config.reference_compile
218
+ else self.drop(self.norm(self.tok_embeddings(input_ids)))
219
+ )
220
+ return hidden_states
221
+
222
+
223
+ class ModernBertMLP(nn.Module):
224
+ """Applies the GLU at the end of each ModernBERT layer.
225
+
226
+ Compared to the default BERT architecture, this block replaces :class:`~transformers.model.bert.modeling_bert.BertIntermediate`
227
+ and :class:`~transformers.model.bert.modeling_bert.SelfOutput` with a single module that has similar functionality.
228
+ """
229
+
230
+ def __init__(self, config: ModernBertConfig):
231
+ super().__init__()
232
+ self.config = config
233
+ self.Wi = nn.Linear(config.hidden_size, int(config.intermediate_size) * 2, bias=config.mlp_bias)
234
+ self.act = ACT2FN[config.hidden_activation]
235
+ self.drop = nn.Dropout(config.mlp_dropout)
236
+ self.Wo = nn.Linear(config.intermediate_size, config.hidden_size, bias=config.mlp_bias)
237
+
238
+ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
239
+ input, gate = self.Wi(hidden_states).chunk(2, dim=-1)
240
+ return self.Wo(self.drop(self.act(input) * gate))
241
+
242
+
243
+ class ModernBertRotaryEmbedding(nn.Module):
244
+ def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None):
245
+ super().__init__()
246
+
247
+ self.dim = dim
248
+ self.max_position_embeddings = max_position_embeddings
249
+ self.base = base
250
+ inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64).float() / self.dim))
251
+ self.register_buffer("inv_freq", tensor=inv_freq, persistent=False)
252
+
253
+ @torch.no_grad()
254
+ def forward(self, x, position_ids, seq_len=None):
255
+ # x: [bs, num_attention_heads, seq_len, head_size]
256
+ self.inv_freq.to(x.device)
257
+ inv_freq_expanded = self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1)
258
+ position_ids_expanded = position_ids[:, None, :].float()
259
+ # Force float32 since bfloat16 loses precision on long contexts
260
+ # See https://github.com/huggingface/transformers/pull/29285
261
+ device_type = x.device.type
262
+ device_type = device_type if isinstance(device_type, str) and device_type != "mps" else "cpu"
263
+ with torch.autocast(device_type=device_type, enabled=False):
264
+ freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
265
+ emb = torch.cat((freqs, freqs), dim=-1)
266
+ cos = emb.cos()
267
+ sin = emb.sin()
268
+ return cos.to(dtype=x.dtype), sin.to(dtype=x.dtype)
269
+
270
+
271
+ def rotate_half(x):
272
+ """Rotates half the hidden dims of the input."""
273
+ x1 = x[..., : x.shape[-1] // 2]
274
+ x2 = x[..., x.shape[-1] // 2 :]
275
+ return torch.cat((-x2, x1), dim=-1)
276
+
277
+
278
+ def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
279
+ """Applies Rotary Position Embedding to the query and key tensors.
280
+
281
+ Args:
282
+ q (`torch.Tensor`): The query tensor.
283
+ k (`torch.Tensor`): The key tensor.
284
+ cos (`torch.Tensor`): The cosine part of the rotary embedding.
285
+ sin (`torch.Tensor`): The sine part of the rotary embedding.
286
+ position_ids (`torch.Tensor`, *optional*):
287
+ Deprecated and unused.
288
+ unsqueeze_dim (`int`, *optional*, defaults to 1):
289
+ The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and
290
+ sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note
291
+ that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and
292
+ k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes
293
+ cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have
294
+ the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.
295
+ Returns:
296
+ `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
297
+ """
298
+ cos = cos.unsqueeze(unsqueeze_dim)
299
+ sin = sin.unsqueeze(unsqueeze_dim)
300
+ q_embed = (q * cos) + (rotate_half(q) * sin)
301
+ k_embed = (k * cos) + (rotate_half(k) * sin)
302
+ return q_embed, k_embed
303
+
304
+
305
+ def eager_attention_forward(
306
+ module: "ModernBertAttention",
307
+ qkv: torch.Tensor,
308
+ attention_mask: torch.Tensor,
309
+ sliding_window_mask: torch.Tensor,
310
+ position_ids: Optional[torch.LongTensor],
311
+ local_attention: Tuple[int, int],
312
+ bs: int,
313
+ dim: int,
314
+ output_attentions: Optional[bool] = False,
315
+ **_kwargs,
316
+ ) -> Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor]]:
317
+ # qkv: [batch_size, seqlen, 3, nheads, headdim]
318
+ cos, sin = module.rotary_emb(qkv, position_ids=position_ids)
319
+ query, key, value = qkv.transpose(3, 1).unbind(dim=2)
320
+ # query, key, value: [batch_size, heads, seq_len, head_dim]
321
+ query, key = apply_rotary_pos_emb(query, key, cos, sin)
322
+
323
+ scale = module.head_dim**-0.5
324
+ attn_weights = torch.matmul(query, key.transpose(2, 3)) * scale
325
+
326
+ if local_attention != (-1, -1):
327
+ attention_mask = sliding_window_mask
328
+
329
+ attn_weights = attn_weights + attention_mask
330
+
331
+ # upcast attention to fp32
332
+ attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query.dtype)
333
+ attn_weights = nn.functional.dropout(attn_weights, p=module.attention_dropout, training=module.training)
334
+ attn_output = torch.matmul(attn_weights, value)
335
+ attn_output = attn_output.transpose(1, 2).contiguous()
336
+ attn_output = attn_output.view(bs, -1, dim)
337
+ if output_attentions:
338
+ return (attn_output, attn_weights)
339
+ return (attn_output,)
340
+
341
+
342
+ def flash_attention_forward(
343
+ module: "ModernBertAttention",
344
+ qkv: torch.Tensor,
345
+ rotary_emb: ModernBertUnpaddedRotaryEmbedding,
346
+ cu_seqlens: torch.Tensor,
347
+ max_seqlen: int,
348
+ local_attention: Tuple[int, int],
349
+ bs: int,
350
+ dim: int,
351
+ target_dtype: torch.dtype = torch.bfloat16,
352
+ **_kwargs,
353
+ ) -> Tuple[torch.Tensor]:
354
+ # (total_seqlen, 3, nheads, headdim)
355
+ qkv = rotary_emb(qkv, cu_seqlens=cu_seqlens, max_seqlen=max_seqlen)
356
+
357
+ convert_dtype = qkv.dtype not in (torch.float16, torch.bfloat16)
358
+ if convert_dtype:
359
+ # FA2 implementation only supports fp16 and bf16. If FA2 is supported,
360
+ # bfloat16 must be supported as of FA2 2.5.7. (Turing GPUs not supported)
361
+ orig_dtype = qkv.dtype
362
+ qkv = qkv.to(target_dtype)
363
+
364
+ attn = flash_attn_varlen_qkvpacked_func(
365
+ qkv,
366
+ cu_seqlens=cu_seqlens,
367
+ max_seqlen=max_seqlen,
368
+ dropout_p=module.attention_dropout if module.training else 0.0,
369
+ deterministic=module.deterministic_flash_attn,
370
+ window_size=local_attention,
371
+ )
372
+ attn = attn.to(orig_dtype) # type: ignore
373
+ else:
374
+ attn = flash_attn_varlen_qkvpacked_func(
375
+ qkv,
376
+ cu_seqlens=cu_seqlens,
377
+ max_seqlen=max_seqlen,
378
+ dropout_p=module.attention_dropout if module.training else 0.0,
379
+ deterministic=module.deterministic_flash_attn,
380
+ window_size=local_attention,
381
+ )
382
+ return (attn.view(bs, dim),)
383
+
384
+
385
+ def sdpa_attention_forward(
386
+ module: "ModernBertAttention",
387
+ qkv: torch.Tensor,
388
+ attention_mask: torch.Tensor,
389
+ sliding_window_mask: torch.Tensor,
390
+ position_ids: Optional[torch.LongTensor],
391
+ local_attention: Tuple[int, int],
392
+ bs: int,
393
+ dim: int,
394
+ **_kwargs,
395
+ ) -> Tuple[torch.Tensor]:
396
+ # qkv: [batch_size, seqlen, 3, nheads, headdim]
397
+ cos, sin = module.rotary_emb(qkv, position_ids=position_ids)
398
+ query, key, value = qkv.transpose(3, 1).unbind(dim=2)
399
+ # query, key, value: [batch_size, heads, seq_len, head_dim]
400
+ query, key = apply_rotary_pos_emb(query, key, cos, sin)
401
+
402
+ if local_attention != (-1, -1):
403
+ attention_mask = sliding_window_mask
404
+
405
+ attn_output = (
406
+ F.scaled_dot_product_attention(
407
+ query,
408
+ key,
409
+ value,
410
+ dropout_p=module.attention_dropout if module.training else 0.0,
411
+ attn_mask=attention_mask,
412
+ )
413
+ .transpose(1, 2)
414
+ .contiguous()
415
+ )
416
+ attn_output = attn_output.view(bs, -1, dim)
417
+ return (attn_output,)
418
+
419
+
420
+ MODERNBERT_ATTENTION_FUNCTION = {
421
+ "flash_attention_2": flash_attention_forward,
422
+ "eager": eager_attention_forward,
423
+ "sdpa": sdpa_attention_forward,
424
+ }
425
+
426
+
427
+ class ModernBertAttention(nn.Module):
428
+ """Performs multi-headed self attention on a batch of unpadded sequences.
429
+
430
+ If Flash Attention 2 is installed, this module uses Flash Attention to improve throughput.
431
+ If Flash Attention 2 is not installed, the implementation will use PyTorch's SDPA kernel,
432
+ which requires padding and unpadding inputs, adding some overhead.
433
+
434
+ See `forward` method for additional details.
435
+ """
436
+
437
+ def __init__(self, config: ModernBertConfig, layer_id: Optional[int] = None):
438
+ super().__init__()
439
+ self.config = config
440
+ self.layer_id = layer_id
441
+
442
+ if config.hidden_size % config.num_attention_heads != 0:
443
+ raise ValueError(
444
+ f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention heads ({config.num_attention_heads})"
445
+ )
446
+
447
+ self.attention_dropout = config.attention_dropout
448
+ self.deterministic_flash_attn = config.deterministic_flash_attn
449
+ self.num_heads = config.num_attention_heads
450
+ self.head_dim = config.hidden_size // config.num_attention_heads
451
+ self.all_head_size = self.head_dim * self.num_heads
452
+ self.Wqkv = nn.Linear(config.hidden_size, 3 * self.all_head_size, bias=config.attention_bias)
453
+
454
+ if layer_id % config.global_attn_every_n_layers != 0:
455
+ self.local_attention = (config.local_attention // 2, config.local_attention // 2)
456
+ else:
457
+ self.local_attention = (-1, -1)
458
+
459
+ rope_theta = config.global_rope_theta
460
+ max_position_embeddings = config.max_position_embeddings
461
+ if self.local_attention != (-1, -1):
462
+ if config.local_rope_theta is not None:
463
+ rope_theta = config.local_rope_theta
464
+ max_position_embeddings = config.local_attention
465
+
466
+ if config._attn_implementation == "flash_attention_2":
467
+ self.rotary_emb = ModernBertUnpaddedRotaryEmbedding(
468
+ dim=self.head_dim, max_seqlen=max_position_embeddings, base=rope_theta
469
+ )
470
+ else:
471
+ self.rotary_emb = ModernBertRotaryEmbedding(
472
+ dim=self.head_dim, max_position_embeddings=max_position_embeddings, base=rope_theta
473
+ )
474
+
475
+ self.Wo = nn.Linear(config.hidden_size, config.hidden_size, bias=config.attention_bias)
476
+ self.out_drop = nn.Dropout(config.attention_dropout) if config.attention_dropout > 0.0 else nn.Identity()
477
+ self.pruned_heads = set()
478
+
479
+ def forward(
480
+ self,
481
+ hidden_states: torch.Tensor,
482
+ output_attentions: Optional[bool] = False,
483
+ **kwargs,
484
+ ) -> torch.Tensor:
485
+ qkv = self.Wqkv(hidden_states)
486
+
487
+ bs = hidden_states.shape[0]
488
+ if self.config._attn_implementation == "flash_attention_2":
489
+ qkv = qkv.view(-1, 3, self.num_heads, self.head_dim)
490
+ else:
491
+ qkv = qkv.view(bs, -1, 3, self.num_heads, self.head_dim)
492
+
493
+ attn_outputs = MODERNBERT_ATTENTION_FUNCTION[self.config._attn_implementation](
494
+ self,
495
+ qkv=qkv,
496
+ rotary_emb=self.rotary_emb,
497
+ local_attention=self.local_attention,
498
+ bs=bs,
499
+ dim=self.all_head_size,
500
+ output_attentions=output_attentions,
501
+ **kwargs,
502
+ )
503
+ hidden_states = attn_outputs[0]
504
+ hidden_states = self.out_drop(self.Wo(hidden_states))
505
+
506
+ return (hidden_states,) + attn_outputs[1:] # add attentions if outputted
507
+
508
+
509
+ class ModernBertEncoderLayer(nn.Module):
510
+ def __init__(self, config: ModernBertConfig, layer_id: Optional[int] = None):
511
+ super().__init__()
512
+ self.config = config
513
+ if layer_id == 0:
514
+ self.attn_norm = nn.Identity()
515
+ else:
516
+ self.attn_norm = nn.LayerNorm(config.hidden_size, eps=config.norm_eps, bias=config.norm_bias)
517
+ self.attn = ModernBertAttention(config=config, layer_id=layer_id)
518
+ self.mlp_norm = nn.LayerNorm(config.hidden_size, eps=config.norm_eps, bias=config.norm_bias)
519
+ self.mlp = ModernBertMLP(config)
520
+
521
+ @torch.compile(dynamic=True)
522
+ def compiled_mlp(self, hidden_states: torch.Tensor) -> torch.Tensor:
523
+ return self.mlp(self.mlp_norm(hidden_states))
524
+
525
+ def forward(
526
+ self,
527
+ hidden_states: torch.Tensor,
528
+ attention_mask: Optional[torch.Tensor] = None,
529
+ sliding_window_mask: Optional[torch.Tensor] = None,
530
+ position_ids: Optional[torch.LongTensor] = None,
531
+ cu_seqlens: Optional[torch.Tensor] = None,
532
+ max_seqlen: Optional[int] = None,
533
+ output_attentions: Optional[bool] = False,
534
+ ) -> torch.Tensor:
535
+ attn_outputs = self.attn(
536
+ self.attn_norm(hidden_states),
537
+ attention_mask=attention_mask,
538
+ sliding_window_mask=sliding_window_mask,
539
+ position_ids=position_ids,
540
+ cu_seqlens=cu_seqlens,
541
+ max_seqlen=max_seqlen,
542
+ output_attentions=output_attentions,
543
+ )
544
+ hidden_states = hidden_states + attn_outputs[0]
545
+ mlp_output = (
546
+ self.compiled_mlp(hidden_states)
547
+ if self.config.reference_compile
548
+ else self.mlp(self.mlp_norm(hidden_states))
549
+ )
550
+ hidden_states = hidden_states + mlp_output
551
+
552
+ return (hidden_states,) + attn_outputs[1:] # add attentions if outputted
553
+
554
+
555
+ MODERNBERT_START_DOCSTRING = r"""
556
+ This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
557
+ library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
558
+ etc.)
559
+
560
+ This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
561
+ Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
562
+ and behavior.
563
+
564
+ Parameters:
565
+ config ([`ModernBertConfig`]):
566
+ Model configuration class with all the parameters of the model. Initializing with a config file does not
567
+ load the weights associated with the model, only the configuration. Check out the
568
+ [`~PreTrainedModel.from_pretrained`] method to load the model weights.
569
+ """
570
+
571
+
572
+ @add_start_docstrings(
573
+ "The bare ModernBert Model outputting raw hidden-states without any specific head on top.",
574
+ MODERNBERT_START_DOCSTRING,
575
+ )
576
+ class ModernBertPreTrainedModel(PreTrainedModel):
577
+ config_class = ModernBertConfig
578
+ base_model_prefix = "model"
579
+ supports_gradient_checkpointing = True
580
+ _no_split_modules = ["ModernBertEmbeddings", "ModernBertEncoderLayer"]
581
+ _supports_flash_attn_2 = True
582
+ _supports_sdpa = True
583
+ _supports_flex_attn = False
584
+
585
+ def _init_weights(self, module: nn.Module):
586
+ cutoff_factor = self.config.initializer_cutoff_factor
587
+ if cutoff_factor is None:
588
+ cutoff_factor = 3
589
+
590
+ def init_weight(module: nn.Module, std: float):
591
+ nn.init.trunc_normal_(
592
+ module.weight,
593
+ mean=0.0,
594
+ std=std,
595
+ a=-cutoff_factor * std,
596
+ b=cutoff_factor * std,
597
+ )
598
+
599
+ if isinstance(module, nn.Linear):
600
+ if module.bias is not None:
601
+ nn.init.zeros_(module.bias)
602
+
603
+ stds = {
604
+ "in": self.config.initializer_range,
605
+ "out": self.config.initializer_range / math.sqrt(2.0 * self.config.num_hidden_layers),
606
+ "embedding": self.config.initializer_range,
607
+ "final_out": self.config.hidden_size**-0.5,
608
+ }
609
+
610
+ if isinstance(module, ModernBertEmbeddings):
611
+ init_weight(module.tok_embeddings, stds["embedding"])
612
+ elif isinstance(module, ModernBertMLP):
613
+ init_weight(module.Wi, stds["in"])
614
+ init_weight(module.Wo, stds["out"])
615
+ elif isinstance(module, ModernBertAttention):
616
+ init_weight(module.Wqkv, stds["in"])
617
+ init_weight(module.Wo, stds["out"])
618
+ elif isinstance(module, ModernBertPredictionHead):
619
+ init_weight(module.dense, stds["out"])
620
+ elif isinstance(module, ModernBertForMaskedLM):
621
+ init_weight(module.decoder, stds["out"])
622
+ elif isinstance(module, (ModernBertForSequenceClassification, ModernBertForTokenClassification)):
623
+ init_weight(module.classifier, stds["final_out"])
624
+
625
+ @classmethod
626
+ def _autoset_attn_implementation(
627
+ cls,
628
+ config,
629
+ use_flash_attention_2: bool = False,
630
+ torch_dtype: Optional[torch.dtype] = None,
631
+ device_map: Optional[Union[str, Dict[str, int]]] = None,
632
+ check_device_map: bool = True,
633
+ ):
634
+ # If the user didn't specify anything, try to use flash_attention_2 if available.
635
+ # Otherwise we fall back to the default SDPA -> Eager from the super() method.
636
+ if config._attn_implementation_internal is None:
637
+ config._attn_implementation_internal = "flash_attention_2"
638
+ try:
639
+ return cls._check_and_enable_flash_attn_2(
640
+ config,
641
+ torch_dtype=torch_dtype,
642
+ device_map=device_map,
643
+ hard_check_only=False,
644
+ check_device_map=check_device_map,
645
+ )
646
+ except (ValueError, ImportError):
647
+ config._attn_implementation_internal = None
648
+ return super()._autoset_attn_implementation(
649
+ config,
650
+ use_flash_attention_2=use_flash_attention_2,
651
+ torch_dtype=torch_dtype,
652
+ device_map=device_map,
653
+ check_device_map=check_device_map,
654
+ )
655
+
656
+ def _maybe_set_compile(self):
657
+ if self.config.reference_compile is False:
658
+ return
659
+
660
+ if hasattr(self, "hf_device_map") and len(self.hf_device_map) > 1:
661
+ if self.config.reference_compile:
662
+ logger.warning_once(
663
+ "If `accelerate` split the model across devices, `torch.compile` will not work. "
664
+ "Falling back to non-compiled mode."
665
+ )
666
+ self.config.reference_compile = False
667
+
668
+ if self.device.type == "mps":
669
+ if self.config.reference_compile:
670
+ logger.warning_once(
671
+ "Compiling the model with `torch.compile` and using a `torch.mps` device is not supported. "
672
+ "Falling back to non-compiled mode."
673
+ )
674
+ self.config.reference_compile = False
675
+
676
+ if self.config.reference_compile is None:
677
+ self.config.reference_compile = is_triton_available()
678
+
679
+ def resize_token_embeddings(self, *args, **kwargs):
680
+ model_embeds = super().resize_token_embeddings(*args, **kwargs)
681
+
682
+ if self.config.reference_compile in {True, None}:
683
+ if self.config.reference_compile:
684
+ logger.warning_once(
685
+ "Resizing token embeddings with `torch.compile` is not supported. Falling back to non-compiled mode."
686
+ )
687
+ self.config.reference_compile = False
688
+
689
+ return model_embeds
690
+
691
+
692
+ def _unpad_modernbert_input(
693
+ inputs: torch.Tensor,
694
+ attention_mask: torch.Tensor,
695
+ position_ids: Optional[torch.Tensor] = None,
696
+ labels: Optional[torch.Tensor] = None,
697
+ ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, int, Optional[torch.Tensor], Optional[torch.Tensor]]:
698
+ """
699
+ Remove padding from input sequences.
700
+
701
+ Args:
702
+ inputs: (batch, seqlen, ...) or (batch, seqlen)
703
+ attention_mask: (batch, seqlen), bool / int, 1 means valid and 0 means not valid.
704
+ position_ids: (batch, seqlen), int, position ids
705
+ labels: (batch, seqlen), int, labels
706
+
707
+ Returns:
708
+ unpadded_inputs: (total_nnz, ...), where total_nnz = number of tokens selected in attention_mask.
709
+ indices: (total_nnz)
710
+ cu_seqlens: (batch + 1), the cumulative sequence lengths
711
+ max_seqlen_in_batch: int
712
+ unpadded_position_ids: (total_nnz) or None
713
+ unpadded_labels: (total_nnz) or None
714
+ """
715
+ seqlens_in_batch = attention_mask.sum(dim=-1, dtype=torch.int32)
716
+ indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
717
+ max_seqlen_in_batch = int(seqlens_in_batch.max().item())
718
+ cu_seqlens = torch.nn.functional.pad(torch.cumsum(seqlens_in_batch, dim=0, dtype=torch.int32), (1, 0))
719
+
720
+ if inputs.dim() == 2:
721
+ unpadded_inputs = inputs.flatten()[indices]
722
+ else:
723
+ batch, seqlen, *rest = inputs.shape
724
+ shape = batch * seqlen
725
+ unpadded_inputs = inputs.view(shape, *rest)[indices]
726
+
727
+ unpadded_position_ids = position_ids.flatten()[indices] if position_ids is not None else None
728
+ unpadded_labels = labels.flatten()[indices] if labels is not None else None
729
+
730
+ return unpadded_inputs, indices, cu_seqlens, max_seqlen_in_batch, unpadded_position_ids, unpadded_labels
731
+
732
+
733
+ def _pad_modernbert_output(
734
+ inputs: torch.Tensor,
735
+ indices: torch.Tensor,
736
+ batch: int,
737
+ seqlen: int,
738
+ ) -> torch.Tensor:
739
+ """
740
+ Add padding to sequences.
741
+
742
+ Args:
743
+ inputs: (total_nnz, ...) or (total_nnz,), where total_nnz = number of tokens selected in attention_mask.
744
+ indices: (total_nnz)
745
+ batch: int, batch size
746
+ seqlen: int, max sequence length
747
+
748
+ Returns:
749
+ padded_inputs: (batch, seqlen, ...) or (batch, seqlen)
750
+ """
751
+ if inputs.dim() == 1:
752
+ output = torch.zeros(batch * seqlen, dtype=inputs.dtype, device=inputs.device)
753
+ output[indices] = inputs
754
+ padded_inputs = output.view(batch, seqlen)
755
+ else:
756
+ _, *rest = inputs.shape
757
+ output = torch.zeros(batch * seqlen, *rest, dtype=inputs.dtype, device=inputs.device)
758
+ output[indices] = inputs
759
+ padded_inputs = output.view(batch, seqlen, *rest)
760
+
761
+ return padded_inputs
762
+
763
+
764
+ MODERNBERT_INPUTS_DOCSTRING = r"""
765
+ Args:
766
+ input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
767
+ Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide
768
+ it.
769
+
770
+ Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
771
+ [`PreTrainedTokenizer.__call__`] for details.
772
+
773
+ [What are input IDs?](../glossary#input-ids)
774
+ attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
775
+ Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
776
+
777
+ - 1 for tokens that are **not masked**,
778
+ - 0 for tokens that are **masked**.
779
+
780
+ [What are attention masks?](../glossary#attention-mask)
781
+
782
+ Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
783
+ [`PreTrainedTokenizer.__call__`] for details.
784
+
785
+ If you want to change padding behavior, you should read [`modeling_opt._prepare_decoder_attention_mask`]
786
+ and modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more
787
+ information on the default strategy.
788
+
789
+ - 1 indicates the head is **not masked**,
790
+ - 0 indicates the head is **masked**.
791
+ sliding_window_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
792
+ Mask to avoid performing attention on padding or far-away tokens. In ModernBert, only every few layers
793
+ perform global attention, while the rest perform local attention. This mask is used to avoid attending to
794
+ far-away tokens in the local attention layers.
795
+ position_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
796
+ Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
797
+ config.n_positions - 1]`.
798
+
799
+ [What are position IDs?](../glossary#position-ids)
800
+ inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
801
+ Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
802
+ is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
803
+ model's internal embedding lookup matrix.
804
+ indices (`torch.Tensor` of shape `(total_unpadded_tokens,)`, *optional*):
805
+ Indices of the non-padding tokens in the input sequence. Used for unpadding the output.
806
+ cu_seqlens (`torch.Tensor` of shape `(batch + 1,)`, *optional*):
807
+ Cumulative sequence lengths of the input sequences. Used to index the unpadded tensors.
808
+ max_seqlen (`int`, *optional*):
809
+ Maximum sequence length in the batch. Used to pad the output tensors.
810
+ batch_size (`int`, *optional*):
811
+ Batch size of the input sequences. Used to pad the output tensors.
812
+ seq_len (`int`, *optional*):
813
+ Sequence length of the input sequences. Used to pad the output tensors.
814
+ output_attentions (`bool`, *optional*):
815
+ Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
816
+ tensors for more detail.
817
+ output_hidden_states (`bool`, *optional*):
818
+ Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
819
+ more detail.
820
+ return_dict (`bool`, *optional*):
821
+ Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
822
+ """
823
+
824
+
825
+ @add_start_docstrings(
826
+ "The bare ModernBert Model outputting raw hidden-states without any specific head on top.",
827
+ MODERNBERT_START_DOCSTRING,
828
+ )
829
+ class ModernBertModel(ModernBertPreTrainedModel):
830
+ def __init__(self, config: ModernBertConfig):
831
+ super().__init__(config)
832
+ self.config = config
833
+ self.embeddings = ModernBertEmbeddings(config)
834
+ self.layers = nn.ModuleList(
835
+ [ModernBertEncoderLayer(config, layer_id) for layer_id in range(config.num_hidden_layers)]
836
+ )
837
+ self.final_norm = nn.LayerNorm(config.hidden_size, eps=config.norm_eps, bias=config.norm_bias)
838
+ self.gradient_checkpointing = False
839
+ self.post_init()
840
+
841
+ def get_input_embeddings(self):
842
+ return self.embeddings.tok_embeddings
843
+
844
+ def set_input_embeddings(self, value):
845
+ self.embeddings.tok_embeddings = value
846
+
847
+ @add_start_docstrings_to_model_forward(MODERNBERT_INPUTS_DOCSTRING)
848
+ @add_code_sample_docstrings(
849
+ checkpoint=_CHECKPOINT_FOR_DOC,
850
+ output_type=BaseModelOutput,
851
+ config_class=_CONFIG_FOR_DOC,
852
+ )
853
+ def forward(
854
+ self,
855
+ input_ids: Optional[torch.LongTensor] = None,
856
+ attention_mask: Optional[torch.Tensor] = None,
857
+ sliding_window_mask: Optional[torch.Tensor] = None,
858
+ position_ids: Optional[torch.LongTensor] = None,
859
+ inputs_embeds: Optional[torch.Tensor] = None,
860
+ indices: Optional[torch.Tensor] = None,
861
+ cu_seqlens: Optional[torch.Tensor] = None,
862
+ max_seqlen: Optional[int] = None,
863
+ batch_size: Optional[int] = None,
864
+ seq_len: Optional[int] = None,
865
+ output_attentions: Optional[bool] = None,
866
+ output_hidden_states: Optional[bool] = None,
867
+ return_dict: Optional[bool] = None,
868
+ ) -> Union[Tuple[torch.Tensor, ...], BaseModelOutput]:
869
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
870
+ output_hidden_states = (
871
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
872
+ )
873
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
874
+
875
+ if (input_ids is None) ^ (inputs_embeds is not None):
876
+ raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
877
+
878
+ all_hidden_states = () if output_hidden_states else None
879
+ all_self_attentions = () if output_attentions else None
880
+
881
+ self._maybe_set_compile()
882
+
883
+ if input_ids is not None:
884
+ self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
885
+
886
+ if batch_size is None and seq_len is None:
887
+ if inputs_embeds is not None:
888
+ batch_size, seq_len = inputs_embeds.shape[:2]
889
+ else:
890
+ batch_size, seq_len = input_ids.shape[:2]
891
+ device = input_ids.device if input_ids is not None else inputs_embeds.device
892
+
893
+ if attention_mask is None:
894
+ attention_mask = torch.ones((batch_size, seq_len), device=device, dtype=torch.bool)
895
+
896
+ repad = False
897
+ if self.config._attn_implementation == "flash_attention_2":
898
+ if indices is None and cu_seqlens is None and max_seqlen is None:
899
+ repad = True
900
+ if inputs_embeds is None:
901
+ with torch.no_grad():
902
+ input_ids, indices, cu_seqlens, max_seqlen, *_ = _unpad_modernbert_input(
903
+ inputs=input_ids, attention_mask=attention_mask
904
+ )
905
+ else:
906
+ inputs_embeds, indices, cu_seqlens, max_seqlen, *_ = _unpad_modernbert_input(
907
+ inputs=inputs_embeds, attention_mask=attention_mask
908
+ )
909
+ else:
910
+ if position_ids is None:
911
+ position_ids = torch.arange(seq_len, device=device).unsqueeze(0)
912
+
913
+ attention_mask, sliding_window_mask = self._update_attention_mask(
914
+ attention_mask, output_attentions=output_attentions
915
+ )
916
+
917
+ hidden_states = self.embeddings(input_ids=input_ids, inputs_embeds=inputs_embeds)
918
+
919
+ for encoder_layer in self.layers:
920
+ if output_hidden_states:
921
+ all_hidden_states = all_hidden_states + (hidden_states,)
922
+
923
+ if self.gradient_checkpointing and self.training:
924
+ layer_outputs = self._gradient_checkpointing_func(
925
+ encoder_layer.__call__,
926
+ hidden_states,
927
+ attention_mask,
928
+ sliding_window_mask,
929
+ position_ids,
930
+ cu_seqlens,
931
+ max_seqlen,
932
+ output_attentions,
933
+ )
934
+ else:
935
+ layer_outputs = encoder_layer(
936
+ hidden_states,
937
+ attention_mask=attention_mask,
938
+ sliding_window_mask=sliding_window_mask,
939
+ position_ids=position_ids,
940
+ cu_seqlens=cu_seqlens,
941
+ max_seqlen=max_seqlen,
942
+ output_attentions=output_attentions,
943
+ )
944
+ hidden_states = layer_outputs[0]
945
+ if output_attentions and len(layer_outputs) > 1:
946
+ all_self_attentions = all_self_attentions + (layer_outputs[1],)
947
+
948
+ if output_hidden_states:
949
+ all_hidden_states = all_hidden_states + (hidden_states,)
950
+
951
+ hidden_states = self.final_norm(hidden_states)
952
+
953
+ if repad:
954
+ hidden_states = _pad_modernbert_output(
955
+ inputs=hidden_states, indices=indices, batch=batch_size, seqlen=seq_len
956
+ )
957
+ if all_hidden_states is not None:
958
+ all_hidden_states = tuple(
959
+ _pad_modernbert_output(inputs=hs, indices=indices, batch=batch_size, seqlen=seq_len)
960
+ for hs in all_hidden_states
961
+ )
962
+
963
+ if not return_dict:
964
+ return tuple(v for v in [hidden_states, all_hidden_states, all_self_attentions] if v is not None)
965
+ return BaseModelOutput(
966
+ last_hidden_state=hidden_states,
967
+ hidden_states=all_hidden_states,
968
+ attentions=all_self_attentions,
969
+ )
970
+
971
+ def _update_attention_mask(self, attention_mask: torch.Tensor, output_attentions: bool) -> torch.Tensor:
972
+ if output_attentions:
973
+ if self.config._attn_implementation == "sdpa":
974
+ logger.warning_once(
975
+ "Outputting attentions is only supported with the 'eager' attention implementation, "
976
+ 'not with "sdpa". Falling back to `attn_implementation="eager"`.'
977
+ )
978
+ self.config._attn_implementation = "eager"
979
+ elif self.config._attn_implementation != "eager":
980
+ logger.warning_once(
981
+ "Outputting attentions is only supported with the eager attention implementation, "
982
+ f'not with {self.config._attn_implementation}. Consider setting `attn_implementation="eager"`.'
983
+ " Setting `output_attentions=False`."
984
+ )
985
+
986
+ global_attention_mask = _prepare_4d_attention_mask(attention_mask, self.dtype)
987
+
988
+ # Create position indices
989
+ rows = torch.arange(global_attention_mask.shape[2]).unsqueeze(0)
990
+ # Calculate distance between positions
991
+ distance = torch.abs(rows - rows.T)
992
+
993
+ # Create sliding window mask (1 for positions within window, 0 outside)
994
+ window_mask = (
995
+ (distance <= self.config.local_attention // 2).unsqueeze(0).unsqueeze(0).to(attention_mask.device)
996
+ )
997
+ # Combine with existing mask
998
+ sliding_window_mask = global_attention_mask.masked_fill(window_mask.logical_not(), torch.finfo(self.dtype).min)
999
+
1000
+ return global_attention_mask, sliding_window_mask
1001
+
1002
+
1003
+ class ModernBertPredictionHead(nn.Module):
1004
+ def __init__(self, config: ModernBertConfig):
1005
+ super().__init__()
1006
+ self.config = config
1007
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size, config.classifier_bias)
1008
+ self.act = ACT2FN[config.classifier_activation]
1009
+ self.norm = nn.LayerNorm(config.hidden_size, eps=config.norm_eps, bias=config.norm_bias)
1010
+
1011
+ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
1012
+ return self.norm(self.act(self.dense(hidden_states)))
1013
+
1014
+
1015
+ @add_start_docstrings(
1016
+ "The ModernBert Model with a decoder head on top that is used for masked language modeling.",
1017
+ MODERNBERT_START_DOCSTRING,
1018
+ )
1019
+ class ModernBertForMaskedLM(ModernBertPreTrainedModel):
1020
+ _tied_weights_keys = ["decoder.weight"]
1021
+
1022
+ def __init__(self, config: ModernBertConfig):
1023
+ super().__init__(config)
1024
+ self.config = config
1025
+ self.model = ModernBertModel(config)
1026
+ self.head = ModernBertPredictionHead(config)
1027
+ self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=config.decoder_bias)
1028
+
1029
+ self.sparse_prediction = self.config.sparse_prediction
1030
+ self.sparse_pred_ignore_index = self.config.sparse_pred_ignore_index
1031
+
1032
+ # Initialize weights and apply final processing
1033
+ self.post_init()
1034
+
1035
+ def get_output_embeddings(self):
1036
+ return self.decoder
1037
+
1038
+ def set_output_embeddings(self, new_embeddings: nn.Linear):
1039
+ self.decoder = new_embeddings
1040
+
1041
+ @torch.compile(dynamic=True)
1042
+ def compiled_head(self, output: torch.Tensor) -> torch.Tensor:
1043
+ return self.decoder(self.head(output))
1044
+
1045
+ @add_start_docstrings_to_model_forward(MODERNBERT_INPUTS_DOCSTRING)
1046
+ @add_code_sample_docstrings(
1047
+ checkpoint=_CHECKPOINT_FOR_DOC,
1048
+ output_type=MaskedLMOutput,
1049
+ config_class=_CONFIG_FOR_DOC,
1050
+ )
1051
+ def forward(
1052
+ self,
1053
+ input_ids: Optional[torch.LongTensor] = None,
1054
+ attention_mask: Optional[torch.Tensor] = None,
1055
+ sliding_window_mask: Optional[torch.Tensor] = None,
1056
+ position_ids: Optional[torch.Tensor] = None,
1057
+ inputs_embeds: Optional[torch.Tensor] = None,
1058
+ labels: Optional[torch.Tensor] = None,
1059
+ indices: Optional[torch.Tensor] = None,
1060
+ cu_seqlens: Optional[torch.Tensor] = None,
1061
+ max_seqlen: Optional[int] = None,
1062
+ batch_size: Optional[int] = None,
1063
+ seq_len: Optional[int] = None,
1064
+ output_attentions: Optional[bool] = None,
1065
+ output_hidden_states: Optional[bool] = None,
1066
+ return_dict: Optional[bool] = None,
1067
+ **kwargs,
1068
+ ) -> Union[Tuple[torch.Tensor], MaskedLMOutput]:
1069
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1070
+ self._maybe_set_compile()
1071
+
1072
+ if self.config._attn_implementation == "flash_attention_2":
1073
+ if indices is None and cu_seqlens is None and max_seqlen is None:
1074
+ if batch_size is None and seq_len is None:
1075
+ if inputs_embeds is not None:
1076
+ batch_size, seq_len = inputs_embeds.shape[:2]
1077
+ else:
1078
+ batch_size, seq_len = input_ids.shape[:2]
1079
+ device = input_ids.device if input_ids is not None else inputs_embeds.device
1080
+
1081
+ if attention_mask is None:
1082
+ attention_mask = torch.ones((batch_size, seq_len), device=device, dtype=torch.bool)
1083
+
1084
+ if inputs_embeds is None:
1085
+ with torch.no_grad():
1086
+ input_ids, indices, cu_seqlens, max_seqlen, position_ids, labels = _unpad_modernbert_input(
1087
+ inputs=input_ids, attention_mask=attention_mask, position_ids=position_ids, labels=labels
1088
+ )
1089
+ else:
1090
+ inputs_embeds, indices, cu_seqlens, max_seqlen, position_ids, labels = _unpad_modernbert_input(
1091
+ inputs=inputs_embeds, attention_mask=attention_mask, position_ids=position_ids, labels=labels
1092
+ )
1093
+
1094
+ outputs = self.model(
1095
+ input_ids=input_ids,
1096
+ attention_mask=attention_mask,
1097
+ sliding_window_mask=sliding_window_mask,
1098
+ position_ids=position_ids,
1099
+ inputs_embeds=inputs_embeds,
1100
+ indices=indices,
1101
+ cu_seqlens=cu_seqlens,
1102
+ max_seqlen=max_seqlen,
1103
+ batch_size=batch_size,
1104
+ seq_len=seq_len,
1105
+ output_attentions=output_attentions,
1106
+ output_hidden_states=output_hidden_states,
1107
+ return_dict=return_dict,
1108
+ )
1109
+ last_hidden_state = outputs[0]
1110
+
1111
+ if self.sparse_prediction and labels is not None:
1112
+ # flatten labels and output first
1113
+ labels = labels.view(-1)
1114
+ last_hidden_state = last_hidden_state.view(labels.shape[0], -1)
1115
+
1116
+ # then filter out the non-masked tokens
1117
+ mask_tokens = labels != self.sparse_pred_ignore_index
1118
+ last_hidden_state = last_hidden_state[mask_tokens]
1119
+ labels = labels[mask_tokens]
1120
+
1121
+ logits = (
1122
+ self.compiled_head(last_hidden_state)
1123
+ if self.config.reference_compile
1124
+ else self.decoder(self.head(last_hidden_state))
1125
+ )
1126
+
1127
+ loss = None
1128
+ if labels is not None:
1129
+ loss = self.loss_function(logits, labels, vocab_size=self.config.vocab_size)
1130
+
1131
+ if self.config._attn_implementation == "flash_attention_2":
1132
+ with torch.no_grad():
1133
+ logits = _pad_modernbert_output(inputs=logits, indices=indices, batch=batch_size, seqlen=seq_len)
1134
+ if not return_dict:
1135
+ output = (logits,)
1136
+ return ((loss,) + output) if loss is not None else output
1137
+
1138
+ return MaskedLMOutput(
1139
+ loss=loss,
1140
+ logits=logits,
1141
+ hidden_states=outputs.hidden_states,
1142
+ attentions=outputs.attentions,
1143
+ )
1144
+
1145
+
1146
+ @add_start_docstrings(
1147
+ "The ModernBert Model with a sequence classification head on top that performs pooling.",
1148
+ MODERNBERT_START_DOCSTRING,
1149
+ )
1150
+ class ModernBertForSequenceClassification(ModernBertPreTrainedModel):
1151
+ def __init__(self, config: ModernBertConfig):
1152
+ super().__init__(config)
1153
+ self.num_labels = config.num_labels
1154
+ self.config = config
1155
+
1156
+ self.model = ModernBertModel(config)
1157
+ self.head = ModernBertPredictionHead(config)
1158
+ self.drop = torch.nn.Dropout(config.classifier_dropout)
1159
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels)
1160
+
1161
+ # Initialize weights and apply final processing
1162
+ self.post_init()
1163
+
1164
+ @add_start_docstrings_to_model_forward(MODERNBERT_INPUTS_DOCSTRING)
1165
+ @add_code_sample_docstrings(
1166
+ checkpoint=_CHECKPOINT_FOR_DOC,
1167
+ output_type=SequenceClassifierOutput,
1168
+ config_class=_CONFIG_FOR_DOC,
1169
+ )
1170
+ def forward(
1171
+ self,
1172
+ input_ids: Optional[torch.LongTensor] = None,
1173
+ attention_mask: Optional[torch.Tensor] = None,
1174
+ sliding_window_mask: Optional[torch.Tensor] = None,
1175
+ position_ids: Optional[torch.Tensor] = None,
1176
+ inputs_embeds: Optional[torch.Tensor] = None,
1177
+ labels: Optional[torch.Tensor] = None,
1178
+ indices: Optional[torch.Tensor] = None,
1179
+ cu_seqlens: Optional[torch.Tensor] = None,
1180
+ max_seqlen: Optional[int] = None,
1181
+ batch_size: Optional[int] = None,
1182
+ seq_len: Optional[int] = None,
1183
+ output_attentions: Optional[bool] = None,
1184
+ output_hidden_states: Optional[bool] = None,
1185
+ return_dict: Optional[bool] = None,
1186
+ **kwargs,
1187
+ ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutput]:
1188
+ r"""
1189
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1190
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
1191
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
1192
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
1193
+ """
1194
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1195
+ self._maybe_set_compile()
1196
+
1197
+ outputs = self.model(
1198
+ input_ids=input_ids,
1199
+ attention_mask=attention_mask,
1200
+ sliding_window_mask=sliding_window_mask,
1201
+ position_ids=position_ids,
1202
+ inputs_embeds=inputs_embeds,
1203
+ indices=indices,
1204
+ cu_seqlens=cu_seqlens,
1205
+ max_seqlen=max_seqlen,
1206
+ batch_size=batch_size,
1207
+ seq_len=seq_len,
1208
+ output_attentions=output_attentions,
1209
+ output_hidden_states=output_hidden_states,
1210
+ return_dict=return_dict,
1211
+ )
1212
+ last_hidden_state = outputs[0]
1213
+
1214
+ if self.config.classifier_pooling == "cls":
1215
+ last_hidden_state = last_hidden_state[:, 0]
1216
+ elif self.config.classifier_pooling == "mean":
1217
+ last_hidden_state = (last_hidden_state * attention_mask.unsqueeze(-1)).sum(dim=1) / attention_mask.sum(
1218
+ dim=1, keepdim=True
1219
+ )
1220
+
1221
+ pooled_output = self.head(last_hidden_state)
1222
+ pooled_output = self.drop(pooled_output)
1223
+ logits = self.classifier(pooled_output)
1224
+
1225
+ loss = None
1226
+ if labels is not None:
1227
+ if self.config.problem_type is None:
1228
+ if self.num_labels == 1:
1229
+ self.config.problem_type = "regression"
1230
+ elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
1231
+ self.config.problem_type = "single_label_classification"
1232
+ else:
1233
+ self.config.problem_type = "multi_label_classification"
1234
+
1235
+ if self.config.problem_type == "regression":
1236
+ loss_fct = MSELoss()
1237
+ if self.num_labels == 1:
1238
+ loss = loss_fct(logits.squeeze(), labels.squeeze())
1239
+ else:
1240
+ loss = loss_fct(logits, labels)
1241
+ elif self.config.problem_type == "single_label_classification":
1242
+ loss_fct = CrossEntropyLoss()
1243
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
1244
+ elif self.config.problem_type == "multi_label_classification":
1245
+ loss_fct = BCEWithLogitsLoss()
1246
+ loss = loss_fct(logits, labels)
1247
+
1248
+ if not return_dict:
1249
+ output = (logits,)
1250
+ return ((loss,) + output) if loss is not None else output
1251
+
1252
+ return SequenceClassifierOutput(
1253
+ loss=loss,
1254
+ logits=logits,
1255
+ hidden_states=outputs.hidden_states,
1256
+ attentions=outputs.attentions,
1257
+ )
1258
+
1259
+
1260
+ @add_start_docstrings(
1261
+ "The ModernBert Model with a token classification head on top, e.g. for Named Entity Recognition (NER) tasks.",
1262
+ MODERNBERT_START_DOCSTRING,
1263
+ )
1264
+ class ModernBertForTokenClassification(ModernBertPreTrainedModel):
1265
+ def __init__(self, config: ModernBertConfig):
1266
+ super().__init__(config)
1267
+ self.num_labels = config.num_labels
1268
+
1269
+ self.model = ModernBertModel(config)
1270
+ self.head = ModernBertPredictionHead(config)
1271
+ self.drop = torch.nn.Dropout(config.classifier_dropout)
1272
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels)
1273
+
1274
+ # Initialize weights and apply final processing
1275
+ self.post_init()
1276
+
1277
+ @add_start_docstrings_to_model_forward(MODERNBERT_INPUTS_DOCSTRING)
1278
+ @add_code_sample_docstrings(
1279
+ checkpoint=_CHECKPOINT_FOR_DOC,
1280
+ output_type=TokenClassifierOutput,
1281
+ config_class=_CONFIG_FOR_DOC,
1282
+ )
1283
+ def forward(
1284
+ self,
1285
+ input_ids: Optional[torch.LongTensor] = None,
1286
+ attention_mask: Optional[torch.Tensor] = None,
1287
+ sliding_window_mask: Optional[torch.Tensor] = None,
1288
+ position_ids: Optional[torch.Tensor] = None,
1289
+ inputs_embeds: Optional[torch.Tensor] = None,
1290
+ labels: Optional[torch.Tensor] = None,
1291
+ indices: Optional[torch.Tensor] = None,
1292
+ cu_seqlens: Optional[torch.Tensor] = None,
1293
+ max_seqlen: Optional[int] = None,
1294
+ batch_size: Optional[int] = None,
1295
+ seq_len: Optional[int] = None,
1296
+ output_attentions: Optional[bool] = None,
1297
+ output_hidden_states: Optional[bool] = None,
1298
+ return_dict: Optional[bool] = None,
1299
+ ) -> Union[Tuple[torch.Tensor], TokenClassifierOutput]:
1300
+ r"""
1301
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
1302
+ Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
1303
+ """
1304
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1305
+ self._maybe_set_compile()
1306
+
1307
+ outputs = self.model(
1308
+ input_ids=input_ids,
1309
+ attention_mask=attention_mask,
1310
+ sliding_window_mask=sliding_window_mask,
1311
+ position_ids=position_ids,
1312
+ inputs_embeds=inputs_embeds,
1313
+ indices=indices,
1314
+ cu_seqlens=cu_seqlens,
1315
+ max_seqlen=max_seqlen,
1316
+ batch_size=batch_size,
1317
+ seq_len=seq_len,
1318
+ output_attentions=output_attentions,
1319
+ output_hidden_states=output_hidden_states,
1320
+ return_dict=return_dict,
1321
+ )
1322
+ last_hidden_state = outputs[0]
1323
+
1324
+ last_hidden_state = self.head(last_hidden_state)
1325
+ last_hidden_state = self.drop(last_hidden_state)
1326
+ logits = self.classifier(last_hidden_state)
1327
+
1328
+ loss = None
1329
+ if labels is not None:
1330
+ loss_fct = CrossEntropyLoss()
1331
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
1332
+
1333
+ if not return_dict:
1334
+ output = (logits,) + outputs[1:]
1335
+ return ((loss,) + output) if loss is not None else output
1336
+
1337
+ return TokenClassifierOutput(
1338
+ loss=loss,
1339
+ logits=logits,
1340
+ hidden_states=outputs.hidden_states,
1341
+ attentions=outputs.attentions,
1342
+ )
1343
+
1344
+
1345
+ __all__ = [
1346
+ "ModernBertModel",
1347
+ "ModernBertPreTrainedModel",
1348
+ "ModernBertForMaskedLM",
1349
+ "ModernBertForSequenceClassification",
1350
+ "ModernBertForTokenClassification",
1351
+ ]
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:844fe48fcb5c513dd8c954d8d4d546d4fa8f532babfa99dc4e2ec6feabce90b1
3
+ size 1392836610
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[CLS]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[PAD]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "extra_special_tokens": {},
50
+ "keep_accents": true,
51
+ "mask_token": "[MASK]",
52
+ "model_input_names": [
53
+ "input_ids",
54
+ "attention_mask"
55
+ ],
56
+ "model_max_length": 1000000000000000019884624838656,
57
+ "pad_token": "[PAD]",
58
+ "sep_token": "[SEP]",
59
+ "split_by_punct": true,
60
+ "tokenizer_class": "DebertaV2TokenizerFast",
61
+ "unk_token": "[UNK]"
62
+ }
ud.py ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy
2
+ from transformers import TokenClassificationPipeline
3
+
4
+ class BellmanFordTokenClassificationPipeline(TokenClassificationPipeline):
5
+ def __init__(self,**kwargs):
6
+ super().__init__(**kwargs)
7
+ x=self.model.config.label2id
8
+ y=[k for k in x if k.find("|")<0 and not k.startswith("I-")]
9
+ self.transition=numpy.full((len(x),len(x)),-numpy.inf)
10
+ for k,v in x.items():
11
+ if k.find("|")<0:
12
+ for j in ["I-"+k[2:]] if k.startswith("B-") else [k]+y if k.startswith("I-") else y:
13
+ self.transition[v,x[j]]=0
14
+ def check_model_type(self,supported_models):
15
+ pass
16
+ def postprocess(self,model_outputs,**kwargs):
17
+ if "logits" not in model_outputs:
18
+ return self.postprocess(model_outputs[0],**kwargs)
19
+ return self.bellman_ford_token_classification(model_outputs,**kwargs)
20
+ def bellman_ford_token_classification(self,model_outputs,**kwargs):
21
+ m=model_outputs["logits"][0].numpy()
22
+ e=numpy.exp(m-numpy.max(m,axis=-1,keepdims=True))
23
+ z=e/e.sum(axis=-1,keepdims=True)
24
+ for i in range(m.shape[0]-1,0,-1):
25
+ m[i-1]+=numpy.max(m[i]+self.transition,axis=1)
26
+ k=[numpy.argmax(m[0]+self.transition[0])]
27
+ for i in range(1,m.shape[0]):
28
+ k.append(numpy.argmax(m[i]+self.transition[k[-1]]))
29
+ w=[{"entity":self.model.config.id2label[j],"start":s,"end":e,"score":z[i,j]} for i,((s,e),j) in enumerate(zip(model_outputs["offset_mapping"][0].tolist(),k)) if s<e]
30
+ if "aggregation_strategy" in kwargs and kwargs["aggregation_strategy"]!="none":
31
+ for i,t in reversed(list(enumerate(w))):
32
+ p=t.pop("entity")
33
+ if p.startswith("I-"):
34
+ w[i-1]["score"]=min(w[i-1]["score"],t["score"])
35
+ w[i-1]["end"]=w.pop(i)["end"]
36
+ elif p.startswith("B-"):
37
+ t["entity_group"]=p[2:]
38
+ else:
39
+ t["entity_group"]=p
40
+ for t in w:
41
+ t["text"]=model_outputs["sentence"][t["start"]:t["end"]]
42
+ return w
43
+
44
+ class UniversalDependenciesPipeline(BellmanFordTokenClassificationPipeline):
45
+ def __init__(self,**kwargs):
46
+ kwargs["aggregation_strategy"]="simple"
47
+ super().__init__(**kwargs)
48
+ x=self.model.config.label2id
49
+ self.root=numpy.full((len(x)),-numpy.inf)
50
+ self.left_arc=numpy.full((len(x)),-numpy.inf)
51
+ self.right_arc=numpy.full((len(x)),-numpy.inf)
52
+ for k,v in x.items():
53
+ if k.endswith("|root"):
54
+ self.root[v]=0
55
+ elif k.find("|l-")>0:
56
+ self.left_arc[v]=0
57
+ elif k.find("|r-")>0:
58
+ self.right_arc[v]=0
59
+ def postprocess(self,model_outputs,**kwargs):
60
+ import torch
61
+ kwargs["aggregation_strategy"]="simple"
62
+ if "logits" not in model_outputs:
63
+ return self.postprocess(model_outputs[0],**kwargs)
64
+ w=self.bellman_ford_token_classification(model_outputs,**kwargs)
65
+ off=[(t["start"],t["end"]) for t in w]
66
+ for i,(s,e) in reversed(list(enumerate(off))):
67
+ if s<e:
68
+ d=w[i]["text"]
69
+ j=len(d)-len(d.lstrip())
70
+ if j>0:
71
+ d=d.lstrip()
72
+ off[i]=(off[i][0]+j,off[i][1])
73
+ j=len(d)-len(d.rstrip())
74
+ if j>0:
75
+ d=d.rstrip()
76
+ off[i]=(off[i][0],off[i][1]-j)
77
+ if d.strip()=="":
78
+ off.pop(i)
79
+ w.pop(i)
80
+ v=self.tokenizer([t["text"] for t in w],add_special_tokens=False)
81
+ x=[not t["entity_group"].endswith(".") for t in w]
82
+ if len(x)<127:
83
+ x=[True]*len(x)
84
+ else:
85
+ k=sum([len(x)-i+1 if b else 0 for i,b in enumerate(x)])+1
86
+ for i in numpy.argsort(numpy.array([t["score"] for t in w])):
87
+ if x[i]==False and k+len(x)-i<8192:
88
+ x[i]=True
89
+ k+=len(x)-i+1
90
+ ids=[-1]
91
+ for i in range(len(x)):
92
+ if x[i]:
93
+ ids.append(i)
94
+ for j in range(i+1,len(x)):
95
+ ids.append(j)
96
+ ids.append(-1)
97
+ with torch.no_grad():
98
+ e=self.model.get_input_embeddings().weight
99
+ m=[]
100
+ for j in v["input_ids"]:
101
+ if j==[]:
102
+ j=[self.tokenizer.unk_token_id]
103
+ m.append(e[j,:].sum(axis=0))
104
+ m.append(e[self.tokenizer.sep_token_id,:])
105
+ m=torch.stack(m).to(self.device)
106
+ e=self.model(inputs_embeds=torch.unsqueeze(m[ids,:],0))
107
+ m=e.logits[0].cpu().numpy()
108
+ e=numpy.full((len(x),len(x),m.shape[-1]),m.min())
109
+ k=1
110
+ for i in range(len(x)):
111
+ if x[i]:
112
+ e[i,i]=m[k]+self.root
113
+ k+=1
114
+ for j in range(1,len(x)-i):
115
+ e[i+j,i]=m[k]+self.left_arc
116
+ e[i,i+j]=m[k]+self.right_arc
117
+ k+=1
118
+ k+=1
119
+ m,p=numpy.max(e,axis=2),numpy.argmax(e,axis=2)
120
+ h=self.chu_liu_edmonds(m)
121
+ z=[i for i,j in enumerate(h) if i==j]
122
+ if len(z)>1:
123
+ k,h=z[numpy.argmax(m[z,z])],numpy.min(m)-numpy.max(m)
124
+ m[:,z]+=[[0 if j in z and (i!=j or i==k) else h for i in z] for j in range(m.shape[0])]
125
+ h=self.chu_liu_edmonds(m)
126
+ q=[self.model.config.id2label[p[j,i]].split("|") for i,j in enumerate(h)]
127
+ t=model_outputs["sentence"].replace("\n"," ")
128
+ u="# text = "+t+"\n"
129
+ for i,(s,e) in enumerate(off):
130
+ u+="\t".join([str(i+1),t[s:e],t[s:e],q[i][0],"_","_" if len(q[i])<3 else "|".join(q[i][1:-1]),str(0 if h[i]==i else h[i]+1),"root" if q[i][-1]=="root" else q[i][-1][2:],"_","_" if i+1<len(off) and e<off[i+1][0] else "SpaceAfter=No"])+"\n"
131
+ return u+"\n"
132
+ def chu_liu_edmonds(self,matrix):
133
+ h=numpy.argmax(matrix,axis=0)
134
+ x=[-1 if i==j else j for i,j in enumerate(h)]
135
+ for b in [lambda x,i,j:-1 if i not in x else x[i],lambda x,i,j:-1 if j<0 else x[j]]:
136
+ y=[]
137
+ while x!=y:
138
+ y=list(x)
139
+ for i,j in enumerate(x):
140
+ x[i]=b(x,i,j)
141
+ if max(x)<0:
142
+ return h
143
+ y,x=[i for i,j in enumerate(x) if j==max(x)],[i for i,j in enumerate(x) if j<max(x)]
144
+ z=matrix-numpy.max(matrix,axis=0)
145
+ m=numpy.block([[z[x,:][:,x],numpy.max(z[x,:][:,y],axis=1).reshape(len(x),1)],[numpy.max(z[y,:][:,x],axis=0),numpy.max(z[y,y])]])
146
+ k=[j if i==len(x) else x[j] if j<len(x) else y[numpy.argmax(z[y,x[i]])] for i,j in enumerate(self.chu_liu_edmonds(m))]
147
+ h=[j if i in y else k[x.index(i)] for i,j in enumerate(h)]
148
+ i=y[numpy.argmax(z[x[k[-1]],y] if k[-1]<len(x) else z[y,y])]
149
+ h[i]=x[k[-1]] if k[-1]<len(x) else i
150
+ return h