File size: 6,760 Bytes

---
library_name: transformers
tags: []
pipeline_tag: text2text-generation
widget:
- text: Dành cho <extra_id_0> hàng th <extra_id_1>iết khi mua xe tay ga và Super Cub (khách hàng mua xe <extra_id_2>1/2017).</s> 🍓 Mua góp lã <extra_id_3>ất  <extra_id_4> dẫn c <extra_id_5> từ  <extra_id_6></s> 🍓 Mua góp nhận <extra_id_7> vẹt gốc <extra_id_8></s>
  example_title: Example 1
---

# 5CD-AI/visocial-T5-base
## Overview
<!-- Provide a quick summary of what the model is/does. -->
We trimmed vocabulary size to 50,589 and continually pretrained `google/mt5-base`[1] on a merged 20GB dataset, the training dataset includes:
- Crawled data (100M comments and 15M posts on Facebook)
- UIT data[2], which is used to pretrain `uitnlp/visobert`[2]
- MC4 ecommerce
- 10.7M comments on VOZ Forum from `tarudesu/VOZ-HSD`[7]
- 3.6M reviews from Amazon[3] translated into Vietnamese from `5CD-AI/Vietnamese-amazon_polarity-gg-translated`
 
Here are the results on 3 downstream tasks on Vietnamese social media texts, including Hate Speech Detection(UIT-HSD), Toxic Speech Detection(ViCTSD), Hate Spans Detection(ViHOS):
<table>
        <tr align="center">
            <td rowspan=2><b>Model</td>
            <td rowspan=2><b>Average MF1</td>
            <td colspan=3><b>Hate Speech Detection</td>
            <td colspan=3><b>Toxic Speech Detection</td>
            <td colspan=3><b>Hate Spans Detection</td>
        </tr>
        <tr align="center">
            <td><b>Acc</td>
            <td><b>WF1</td>
            <td><b>MF1</td>
            <td><b>Acc</td>
            <td><b>WF1</td>
            <td><b>MF1</td>
            <td><b>Acc</td>
            <td><b>WF1</td>
            <td><b>MF1</td>
        </tr>
        <tr align="center">
            <td align="left">PhoBERT[4]</td>
            <td>69.63</td>
            <td>86.75</td>
            <td>86.52</td>
            <td>64.76</td>
            <td>90.78</td>
            <td>90.27</td>
            <td>71.31</td>
            <td>84.65</td>
            <td>81.12</td>
            <td>72.81</td>
        </tr>
        <tr align="center">
            <td align="left">PhoBERT_v2[4]</td>
            <td>70.50</td>
            <td>87.42</td>
            <td>87.33</td>
            <td>66.60</td>
            <td>90.23</td>
            <td>89.78</td>
            <td>71.39</td>
            <td>84.92</td>
            <td>81.51</td>
            <td>73.51</td>
        </tr>
        <tr align="center">
            <td align="left">viBERT[5]</td>
            <td>67.80</td>
            <td>86.33</td>
            <td>85.79</td>
            <td>62.85</td>
            <td>88.81</td>
            <td>88.17</td>
            <td>67.65</td>
            <td>84.63</td>
            <td>81.28</td>
            <td>72.91</td>
        </tr>
        <tr align="center">
            <td align="left">ViSoBERT[6]</td>
            <td>75.07</td>
            <td>88.17</td>
            <td>87.86</td>
            <td>67.71</td>
            <td>90.35</td>
            <td>90.16</td>
            <td>71.45</td>
            <td>90.16</td>
            <td>90.07</td>
            <td>86.04</td>
        </tr>
        <tr align="center">
            <td align="left">ViHateT5[7]</td>
            <td>75.56</td>
            <td>88.76</td>
            <td>89.14</td>
            <td>68.67</td>
            <td>90.80</td>
            <td>91.78</td>
            <td>71.63</td>
            <td>91.00</td>
            <td>90.20</td>
            <td>86.37</td>
        </tr>
        <tr align="center">
            <td align="left"><b>visocial-T5-base(Ours)</b></td>
            <td><b>78.01</td>
            <td><b>89.51</td>
            <td><b>89.78</td>
            <td><b>71.19</td>
            <td><b>92.2</td>
            <td><b>93.47</td>
            <td><b>73.81</td>
            <td><b>92.57</td>
            <td><b>92.20</td>
            <td><b>89.04</td>
        </tr>
    </div>
</table>

Visocial-T5-base versus other T5-based models in terms of Vietnamese HSD-related task performance with Macro F1-score:

<table border="1" cellspacing="0" cellpadding="5">
    <tr align="center">
        <td rowspan=2><b>Model</b></td>
        <td colspan=3><b>MF1</b></td>
    </tr>
    <tr align="center">
        <td><b>Hate Speech Detection</b></td>
        <td><b>Toxic Speech Detection</b></td>
        <td><b>Hate Spans Detection</b></td>
    </tr>
    <tr align="center">
        <td align="left">mT5[1]</td>
        <td>66.76</td>
        <td>69.93</td>
        <td>86.60</td>
    </tr>
    <tr align="center">
        <td align="left">ViT5[8]</td>
        <td>66.95</td>
        <td>64.82</td>
        <td>86.90</td>
    </tr>
    <tr align="center">
        <td align="left">ViHateT5[7]</td>
        <td>68.67</td>
        <td>71.63</td>
        <td>86.37</td>
    </tr>
    <tr align="center">
        <td align="left"><b>visocial-T5-base(Ours)</td>
        <td><b>71.90</td>
        <td><b>73.81</td>
        <td><b>89.04</td>
    </tr>
</table>

<!-- ## Usage (HuggingFace Transformers)

Install `transformers` package:
    
    pip install transformers

Then you can use this model for fill-mask task like this:

```python
from transformers import pipeline

model_path = "5CD-AI/visobert-14gb-corpus"
mask_filler = pipeline("fill-mask", model_path)

mask_filler("shop làm ăn như cái <mask>", top_k=10)
``` -->

## Fine-tune Configuration
We fine-tune `5CD-AI/visocial-T5-base` on 3 downstream tasks with `transformers` library with the following configuration:
- seed: 42
- training_epochs: 4
- train_batch_size: 4 
- gradient_accumulation_steps: 8
- learning_rate: 3e-4
- lr_scheduler_type: linear
- model_max_length: 256
- metric_for_best_model: eval_loss
- evaluation_strategy: steps
- eval_steps=0.1

## References
[1] [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934)

[2] [ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing](https://aclanthology.org/2023.emnlp-main.315/)

[3] [The Amazon Polarity dataset](https://paperswithcode.com/dataset/amazon-polarity-1)

[4] [PhoBERT: Pre-trained language models for Vietnamese](https://aclanthology.org/2020.findings-emnlp.92/)

[5] [Improving Sequence Tagging for Vietnamese Text Using Transformer-based Neural Models](https://arxiv.org/abs/2006.15994)

[6] [ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing](https://aclanthology.org/2023.emnlp-main.315/)

[7] [ViHateT5: Enhancing Hate Speech Detection in Vietnamese With A Unified Text-to-Text Transformer Model](https://arxiv.org/abs/2405.14141)

[8] [ViT5: Pretrained Text-to-Text Transformer for Vietnamese Language Generation](https://aclanthology.org/2022.naacl-srw.18/)