File size: 2,405 Bytes
af3d42a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
library_name: transformers
language:
  - yue
license: cc-by-4.0
tags:
  - generated_from_trainer
pipeline_tag: fill-mask
widget:
  - text: 香港原本[MASK]一個人煙稀少嘅漁港。
    example_title: 
model-index:
  - name: bert-large-cantonese
    results: []
---

# bert-large-cantonese

## Description

This model is tranied from scratch on Cantonese text. It is a BERT model with a large architecture (24-layer, 1024-hidden, 16-heads, 326M parameters).

The first training stage is to pre-train the model on 128 length sequences with a batch size of 512 for 1 epoch. the second stage is to continued pre-train the model on 512 length sequences with a batch size of 512 for one more epoch.

## How to use

You can use this model directly with a pipeline for masked language modeling:

```python
from transformers import pipeline

mask_filler = pipeline(
    "fill-mask",
    model="hon9kon9ize/bert-large-cantonese"
)

mask_filler("雞蛋六隻,糖呢就兩茶匙,仲有[MASK]橙皮添。")

; [{'score': 0.08160534501075745,
;   'token': 943,
;   'token_str': '個',
;   'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 個 橙 皮 添 。'},
;  {'score': 0.06182105466723442,
;   'token': 1576,
;   'token_str': '啲',
;   'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 啲 橙 皮 添 。'},
;  {'score': 0.04600336775183678,
;   'token': 1646,
;   'token_str': '嘅',
;   'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 嘅 橙 皮 添 。'},
;  {'score': 0.03743772581219673,
;   'token': 3581,
;   'token_str': '橙',
;   'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 橙 橙 皮 添 。'},
;  {'score': 0.031560592353343964,
;   'token': 5148,
;   'token_str': '紅',
;   'sequence': '雞 蛋 六 隻 , 糖 呢 就 兩 茶 匙 , 仲 有 紅 橙 皮 添 。'}]
```

## Training hyperparameters

The following hyperparameters were used during first training:

- Batch size: 512
- Learning rate: 1e-4
- Learning rate scheduler: linear decay
- 1 Epoch
- Warmup ratio: 0.1

Loss plot on [WanDB](https://api.wandb.ai/links/indiejoseph/v3ljlpmp)

The following hyperparameters were used during second training:

- Batch size: 512
- Learning rate: 5e-5
- Learning rate scheduler: linear decay
- 1 Epoch
- Warmup ratio: 0.1

Loss plot on [WanDB](https://api.wandb.ai/links/indiejoseph/vcm3q1ef)