Text2Text Generation
Transformers
Safetensors
mt5
Inference Endpoints
htdung167 commited on
Commit
99af8ed
·
verified ·
1 Parent(s): ec05cf7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -17
README.md CHANGED
@@ -6,10 +6,12 @@ tags: []
6
  # 5CD-AI/visocial-T5-base
7
  ## Overview
8
  <!-- Provide a quick summary of what the model is/does. -->
9
- <!-- We continually pretrain `uitnlp/visobert` on a merged 14GB dataset, the training dataset includes:
10
  - Internal data (100M comments and 15M posts on Facebook)
11
  - UIT data, which is used to pretrain `uitnlp/visobert`
12
- - MC4 ecommerce -->
 
 
13
 
14
  Here are the results on 3 downstream tasks on Vietnamese social media texts, including Hate Speech Detection(UIT-HSD), Toxic Speech Detection(ViCTSD), Hate Spans Detection(ViHOS):
15
  <table>
@@ -127,21 +129,17 @@ model_path = "5CD-AI/visobert-14gb-corpus"
127
  mask_filler = pipeline("fill-mask", model_path)
128
 
129
  mask_filler("shop làm ăn như cái <mask>", top_k=10)
130
- ```
131
 
132
  ## Fine-tune Configuration
133
- We fine-tune `5CD-AI/visobert-14gb-corpus` on 4 downstream tasks with `transformers` library with the following configuration:
134
  - seed: 42
135
- - gradient_accumulation_steps: 1
136
- - weight_decay: 0.01
137
- - optimizer: AdamW with betas=(0.9, 0.999) and epsilon=1e-08
138
- - training_epochs: 30
139
- - model_max_length: 128
140
- - learning_rate: 1e-5
141
- - metric_for_best_model: wf1
142
- - strategy: epoch
143
-
144
- And different additional configurations for each task:
145
- | Emotion Recognition | Hate Speech Detection | Spam Reviews Detection | Hate Speech Spans Detection |
146
- | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- |
147
- |\- train_batch_size: 64<br>\- lr_scheduler_type: linear | \- train_batch_size: 32<br>\- lr_scheduler_type: linear | \- train_batch_size: 32<br>\- lr_scheduler_type: cosine | \- train_batch_size: 32<br>\- lr_scheduler_type: cosine | -->
 
6
  # 5CD-AI/visocial-T5-base
7
  ## Overview
8
  <!-- Provide a quick summary of what the model is/does. -->
9
+ We continually pretrain `google/mt5-base` on a merged 20GB dataset, the training dataset includes:
10
  - Internal data (100M comments and 15M posts on Facebook)
11
  - UIT data, which is used to pretrain `uitnlp/visobert`
12
+ - MC4 ecommerce
13
+ - 10.7M comments on VOZ Forum from `tarudesu/VOZ-HSD`
14
+ - 3.6M reviews from Amazon translated into Vietnamese from `5CD-AI/Vietnamese-amazon_polarity-gg-translated`
15
 
16
  Here are the results on 3 downstream tasks on Vietnamese social media texts, including Hate Speech Detection(UIT-HSD), Toxic Speech Detection(ViCTSD), Hate Spans Detection(ViHOS):
17
  <table>
 
129
  mask_filler = pipeline("fill-mask", model_path)
130
 
131
  mask_filler("shop làm ăn như cái <mask>", top_k=10)
132
+ ``` -->
133
 
134
  ## Fine-tune Configuration
135
+ We fine-tune `5CD-AI/visocial-T5-base` on 3 downstream tasks with `transformers` library with the following configuration:
136
  - seed: 42
137
+ - training_epochs: 4
138
+ - train_batch_size: 4
139
+ - gradient_accumulation_steps: 8
140
+ - learning_rate: 3e-4
141
+ - lr_scheduler_type: linear
142
+ - model_max_length: 256
143
+ - metric_for_best_model: eval_loss
144
+ - evaluation_strategy: steps
145
+ - eval_steps=0.1