SamLowe
/

all-MiniLM-L12-v2-go_emotions-classifier-onnx

 ---
+language: en
+tags:
+- text-classification
+- onnx
+- emotions
+- multi-class-classification
+- multi-label-classification
+datasets:
+- go_emotions
 license: mit
+inference: false
+widget:
+- text: ONNX is so much faster, its very handy!
 ---
+### Overview
+This is a multi-label, multi-class linear classifer for emotions that works with [sentence-transformers/all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2), having been trained on the [go_emotions](https://huggingface.co/datasets/go_emotions) dataset.
+### Labels
+The 28 labels from the [go_emotions](https://huggingface.co/datasets/go_emotions) dataset are:
+```
+['admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', 'neutral']
+```
+### Metrics (exact match of labels per item)
+This is a multi-label, multi-class dataset, so each label is effectively a separate binary classification. Evaluating across all labels per item in the go_emotions test split the metrics are shown below.
+Optimising the threshold per label to optimise the F1 metric, the metrics (evaluated on the go_emotions test split) are:
+- Precision: 0.378
+- Recall: 0.438
+- F1: 0.394
+Weighted by the relative support of each label in the dataset, this is:
+- Precision: 0.424
+- Recall: 0.590
+- F1: 0.481
+Using a fixed threshold of 0.5 to convert the scores to binary predictions for each label, the metrics (evaluated on the go_emotions test split, and unweighted by support) are:
+- Precision: 0.568
+- Recall: 0.214
+- F1: 0.260
+### Metrics (per-label)
+This is a multi-label, multi-class dataset, so each label is effectively a separate binary classification and metrics are better measured per label.
+Optimising the threshold per label to optimise the F1 metric, the metrics (evaluated on the go_emotions test split) are:
+|                |    f1 | precision | recall | support | threshold |
+| -------------- | ----- | --------- | ------ | ------- | --------- |
+| admiration     | 0.540 |     0.463 |  0.649 |     504 |      0.20 |
+| amusement      | 0.686 |     0.669 |  0.705 |     264 |      0.25 |
+| anger          | 0.419 |     0.373 |  0.480 |     198 |      0.15 |
+| annoyance      | 0.276 |     0.189 |  0.512 |     320 |      0.10 |
+| approval       | 0.299 |     0.260 |  0.350 |     351 |      0.15 |
+| caring         | 0.303 |     0.219 |  0.489 |     135 |      0.10 |
+| confusion      | 0.284 |     0.269 |  0.301 |     153 |      0.15 |
+| curiosity      | 0.365 |     0.310 |  0.444 |     284 |      0.15 |
+| desire         | 0.274 |     0.237 |  0.325 |      83 |      0.15 |
+| disappointment | 0.188 |     0.292 |  0.139 |     151 |      0.20 |
+| disapproval    | 0.305 |     0.257 |  0.375 |     267 |      0.15 |
+| disgust        | 0.450 |     0.462 |  0.439 |     123 |      0.20 |
+| embarrassment  | 0.348 |     0.375 |  0.324 |      37 |      0.30 |
+| excitement     | 0.313 |     0.306 |  0.320 |     103 |      0.20 |
+| fear           | 0.550 |     0.505 |  0.603 |      78 |      0.25 |
+| gratitude      | 0.776 |     0.774 |  0.778 |     352 |      0.30 |
+| grief          | 0.353 |     0.273 |  0.500 |       6 |      0.70 |
+| joy            | 0.370 |     0.361 |  0.379 |     161 |      0.20 |
+| love           | 0.626 |     0.717 |  0.555 |     238 |      0.35 |
+| nervousness    | 0.308 |     0.276 |  0.348 |      23 |      0.55 |
+| optimism       | 0.436 |     0.432 |  0.441 |     186 |      0.20 |
+| pride          | 0.444 |     0.545 |  0.375 |      16 |      0.60 |
+| realization    | 0.171 |     0.146 |  0.207 |     145 |      0.10 |
+| relief         | 0.133 |     0.250 |  0.091 |      11 |      0.60 |
+| remorse        | 0.468 |     0.426 |  0.518 |      56 |      0.30 |
+| sadness        | 0.413 |     0.409 |  0.417 |     156 |      0.20 |
+| surprise       | 0.314 |     0.303 |  0.326 |     141 |      0.15 |
+| neutral        | 0.622 |     0.482 |  0.879 |    1787 |      0.25 |
+### Use with ONNXRuntime
+The input to the model is called `logits`, and there is one output per label. Each output produces a 2d array, with 1 row per input row, and each row having 2 columns - the first being a proba output for the negative case, and the second being a proba output for the positive case.
+```python
+# Assuming you have embeddings from all-MiniLM-L12-v2 for the input sentences
+# E.g. produced from sentence-transformers such as:
+#      huggingface.co/sentence-transformers/all-MiniLM-L12-v2
+#      or from an ONNX version E.g. huggingface.co/Xenova/all-MiniLM-L12-v2
+print(sentences.shape)  # E.g. a batch of 1 sentence
+> (1, 384)
+import onnxruntime as ort
+sess = ort.InferenceSession("path_to_model_dot_onnx", providers=['CPUExecutionProvider'])
+outputs = [o.name for o in sess.get_outputs()]  # list of labels, in the order of the outputs
+preds_onnx = sess.run(_outputs, {'logits': _label_embeddings})
+# preds_onnx is a list with 28 entries, one per label,
+# each with a numpy array of shape (1, 2) given the input was a batch of 1
+print(outputs[0])
+> surprise
+print(preds_onnx[0])
+> array([[0.97136074, 0.02863926]], dtype=float32)
+```
+### Commentary on the dataset
+Some labels (E.g. gratitude) when considered independently perform very strongly, whilst others (E.g. relief) perform very poorly.
+This is a challenging dataset. Labels such as relief do have much fewer examples in the training data (less than 100 out of the 40k+, and only 11 in the test split).
+But there is also some ambiguity and/or labelling errors visible in the training data of go_emotions that is suspected to constrain the performance. Data cleaning on the dataset to reduce some of the mistakes, ambiguity, conflicts and duplication in the labelling would produce a higher performing model.