Spaces:
Running
Running
File size: 2,405 Bytes
af3d42a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
library_name: transformers
language:
- yue
license: cc-by-4.0
tags:
- generated_from_trainer
pipeline_tag: fill-mask
widget:
- text: 香港原本[MASK]一個人煙稀少嘅漁港。
example_title: 係
model-index:
- name: bert-large-cantonese
results: []
---
# bert-large-cantonese
## Description
This model is tranied from scratch on Cantonese text. It is a BERT model with a large architecture (24-layer, 1024-hidden, 16-heads, 326M parameters).
The first training stage is to pre-train the model on 128 length sequences with a batch size of 512 for 1 epoch. the second stage is to continued pre-train the model on 512 length sequences with a batch size of 512 for one more epoch.
## How to use
You can use this model directly with a pipeline for masked language modeling:
```python
from transformers import pipeline
mask_filler = pipeline(
"fill-mask",
model="hon9kon9ize/bert-large-cantonese"
)
mask_filler("雞蛋六隻,糖呢就兩茶匙,仲有[MASK]橙皮添。")
; [{'score': 0.08160534501075745,
; 'token': 943,
; 'token_str': '個',
; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 個 橙 皮 添 。'},
; {'score': 0.06182105466723442,
; 'token': 1576,
; 'token_str': '啲',
; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 啲 橙 皮 添 。'},
; {'score': 0.04600336775183678,
; 'token': 1646,
; 'token_str': '嘅',
; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 嘅 橙 皮 添 。'},
; {'score': 0.03743772581219673,
; 'token': 3581,
; 'token_str': '橙',
; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 橙 橙 皮 添 。'},
; {'score': 0.031560592353343964,
; 'token': 5148,
; 'token_str': '紅',
; 'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 紅 橙 皮 添 。'}]
```
## Training hyperparameters
The following hyperparameters were used during first training:
- Batch size: 512
- Learning rate: 1e-4
- Learning rate scheduler: linear decay
- 1 Epoch
- Warmup ratio: 0.1
Loss plot on [WanDB](https://api.wandb.ai/links/indiejoseph/v3ljlpmp)
The following hyperparameters were used during second training:
- Batch size: 512
- Learning rate: 5e-5
- Learning rate scheduler: linear decay
- 1 Epoch
- Warmup ratio: 0.1
Loss plot on [WanDB](https://api.wandb.ai/links/indiejoseph/vcm3q1ef)
|