niksmer commited on
Commit
9d4c41f
·
1 Parent(s): b560bd9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -13
README.md CHANGED
@@ -1,7 +1,5 @@
1
  ---
2
  license: mit
3
- tags:
4
- - generated_from_trainer
5
  metrics:
6
  - accuracy
7
  - precision
@@ -31,9 +29,43 @@ This model was trained on 115,943 manually annotated sentences to classify text
31
 
32
  ## Intended uses & limitations
33
 
34
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- ## Training and evaluation data
37
 
38
  | Description | Label | Count Train Data | Count Validation Data | Count Test Data | Validation F1-Score | Test F1-Score |
39
  |-------------------------------------------------------------------|-------|------------------|-----------------------|-----------------|---------------------|---------------|
@@ -99,15 +131,24 @@ More information needed
99
  ### Training hyperparameters
100
 
101
  The following hyperparameters were used during training:
102
- - learning_rate: 5e-05
103
- - train_batch_size: 64
104
- - eval_batch_size: 64
105
- - seed: 42
106
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
107
- - lr_scheduler_type: linear
108
- - lr_scheduler_warmup_ratio: 0.05
109
- - num_epochs: 5
110
- - mixed_precision_training: Native AMP
 
 
 
 
 
 
 
 
 
111
 
112
  ### Training results
113
 
@@ -119,6 +160,23 @@ The following hyperparameters were used during training:
119
  | 0.9263 | 4.0 | 7248 | 1.5173 | 0.5975 | 0.5975 | 0.4499 | 0.5901 | 0.5975 | 0.5975 |
120
  | 0.7859 | 5.0 | 9060 | 1.5574 | 0.5978 | 0.5978 | 0.4564 | 0.5903 | 0.5978 | 0.5978 |
121
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
 
123
  ### Framework versions
124
 
 
1
  ---
2
  license: mit
 
 
3
  metrics:
4
  - accuracy
5
  - precision
 
29
 
30
  ## Intended uses & limitations
31
 
32
+ The model output reproduces the limitations of the dataset in terms of country coverage, time span, domain definitions and potential biases of the annotators - as any supervised machine learning model would. Applying the model to other types of data (other types of texts, countries etc.) will reduce performance.
33
+
34
+ ```python
35
+ from transformers import pipeline
36
+ import pandas as pd
37
+ classifier = pipeline(
38
+ task="text-classification",
39
+ model="niksmer/ManiBERT")
40
+ # Load text data you want to classify
41
+ text = pd.read_csv("text.csv")
42
+ # Inference
43
+ output = classifier(text)
44
+ # Print output
45
+ pd.DataFrame(output).head()
46
+ ```
47
+
48
+ ## Train Data
49
+
50
+ ManiBERT was trained on the English-speaking subset of the [Manifesto Project Dataset (MPDS2021a)](https://manifesto-project.wzb.eu/datasets). The model was trained on 115,943 sentences from 163 political manifestos in 7 English-speaking countries (Australia, Canada, Ireland, New Zealand, South Africa, United Kingdom, United States). The manifestos were published between 1992 - 2020.
51
+
52
+
53
+ | Country | Count manifestos | Count sentences | Time span |
54
+ |----------------|------------------|-----------------|--------------------|
55
+ | Australia | 18 | 14,887 | 2010-2016 |
56
+ | Ireland | 23 | 24,966 | 2007-2016 |
57
+ | Canada | 14 | 12,344 | 2004-2008 & 2015 |
58
+ | New Zealand | 46 | 35,079 | 1993-2017 |
59
+ | South Africa | 29 | 13,334 | 1994-2019 |
60
+ | USA | 9 | 13,188 | 1992 & 2004-2020 |
61
+ | United Kingdom | 34 | 30,936 | 1997-2019 |
62
+
63
+ Canadian manifestos between 2004 and 2008 are used as test data.
64
+
65
+ The resulting Datasets are higly (!) imbalanced. See Evaluation.
66
+
67
+ ## Evaluation
68
 
 
69
 
70
  | Description | Label | Count Train Data | Count Validation Data | Count Test Data | Validation F1-Score | Test F1-Score |
71
  |-------------------------------------------------------------------|-------|------------------|-----------------------|-----------------|---------------------|---------------|
 
131
  ### Training hyperparameters
132
 
133
  The following hyperparameters were used during training:
134
+ ```
135
+ training_args = TrainingArguments(
136
+ warmup_ratio=0.05,
137
+ weight_decay=0.1,
138
+ learning_rate=5e-05,
139
+ fp16 = True,
140
+ evaluation_strategy="epoch",
141
+ num_train_epochs=5,
142
+ per_device_train_batch_size=16,
143
+ overwrite_output_dir=True,
144
+ per_device_eval_batch_size=16,
145
+ save_strategy="no",
146
+ logging_dir='logs',
147
+ logging_strategy= 'steps',
148
+ logging_steps=10,
149
+ push_to_hub=True,
150
+ hub_strategy="end")
151
+ ```
152
 
153
  ### Training results
154
 
 
160
  | 0.9263 | 4.0 | 7248 | 1.5173 | 0.5975 | 0.5975 | 0.4499 | 0.5901 | 0.5975 | 0.5975 |
161
  | 0.7859 | 5.0 | 9060 | 1.5574 | 0.5978 | 0.5978 | 0.4564 | 0.5903 | 0.5978 | 0.5978 |
162
 
163
+ ### Overall evaluation
164
+
165
+ | Type | Micro F1-Score | Macro F1-Score | Weighted F1-Score |
166
+ |----------------|----------------|----------------|-------------------|
167
+ | Validation | 0.60 | 0.46 | 0.59 |
168
+ | Test | 0.48 | 0.30 | 0.47 |
169
+
170
+ ### Evaluation based on saliency theory
171
+
172
+ Saliency theory is a theory to analyse politial text data. In sum, parties tend to write about policies in which they think that they are seen as competent.
173
+ Voters tend to assign advantages in policy competence in line to the assumed ideology of parties. Therefore you can analyze the share of policies parties tend to write about in their manifestos to analyze the party ideology.
174
+
175
+ The Manifesto Project presented for such an analysis the rile-index. For a quick overview, check [this](https://manifesto-project.wzb.eu/down/tutorials/main-dataset.html#measuring-parties-left-right-positions).
176
+
177
+ In the following plot, the predicted and original rile-indices are shown per manifesto in the test dataset. Overall the pearson correlation between the predicted and original rile-indices is 0.95. As alternative, you can use [RoBERTa-RILE](https://huggingface.co/niksmer/RoBERTa-RILE).
178
+
179
+ ![image](english_manibert_manifesto.png)
180
 
181
  ### Framework versions
182