File size: 3,976 Bytes
193880e
 
4fb6aa5
 
193880e
 
4fb6aa5
 
fb03fee
193880e
 
4fb6aa5
 
 
 
 
 
 
 
 
 
60dafac
4fb6aa5
60dafac
 
4fb6aa5
60dafac
193880e
 
 
 
 
fb03fee
193880e
4fb6aa5
aec508f
21f25eb
4fb6aa5
aec508f
48f71da
4fb6aa5
21f25eb
 
60dafac
aec508f
48f71da
4fb6aa5
193880e
aec508f
193880e
4fb6aa5
 
 
193880e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48f71da
193880e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
license: apache-2.0
language:
- vi
tags:
- automatic-speech-recognition
- robust-speech-event
- common-voice
- vi
model-index:
- name: xls-asr-vi-40h-1B
  results:
  - task: 
      name: Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Common Voice 7.0
      type: mozilla-foundation/common_voice_7_0

      args: vi
    metrics:
       - name: Test WER (with LM)
         type: wer
         value: 28.269
       - name: Test CER (with LM)
         type: cer
         value: 14.421
---


# xls-asr-vi-40h-1B

This model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on 40 hours of FPT Open Speech Dataset (FOSD).

### Benchmark WER result:
| | [VIVOS](https://huggingface.co/datasets/vivos) | [COMMON VOICE 7.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0) | [COMMON VOICE 8.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0)
|---|---|---|---|
|without LM| 25.93 | 34.21 |
|with 4-grams LM| 24.11 | 28.27 | 31.158 |

### Benchmark CER result:
| | [VIVOS](https://huggingface.co/datasets/vivos) | [COMMON VOICE 7.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0) | [COMMON VOICE 8.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0)
|---|---|---|---|
|without LM| 9.24 | 19.94 |
|with 4-grams LM| 10.37 | 14.42 | 16.179 |

## Evaluation

Please use the eval.py file to run the evaluation

```python
python eval_custom.py --model_id geninhu/xls-asr-vi-40h-1B --dataset mozilla-foundation/common_voice_7_0 --config vi --split test --log_outputs
```

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1500
- num_epochs: 10.0
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Wer    |
|:-------------:|:-----:|:-----:|:---------------:|:------:|
| 4.6222        | 1.85  | 1500  | 5.9479          | 0.5474 |
| 1.1362        | 3.7   | 3000  | 7.9799          | 0.5094 |
| 0.7814        | 5.56  | 4500  | 5.0330          | 0.4724 |
| 0.6281        | 7.41  | 6000  | 2.3484          | 0.5020 |
| 0.5472        | 9.26  | 7500  | 2.2495          | 0.4793 |
| 0.4827        | 11.11 | 9000  | 1.1530          | 0.4768 |
| 0.4327        | 12.96 | 10500 | 1.6160          | 0.4646 |
| 0.3989        | 14.81 | 12000 | 3.2633          | 0.4703 |
| 0.3522        | 16.67 | 13500 | 2.2337          | 0.4708 |
| 0.3201        | 18.52 | 15000 | 3.6879          | 0.4565 |
| 0.2899        | 20.37 | 16500 | 5.4389          | 0.4599 |
| 0.2776        | 22.22 | 18000 | 3.5284          | 0.4537 |
| 0.2574        | 24.07 | 19500 | 2.1759          | 0.4649 |
| 0.2378        | 25.93 | 21000 | 3.3901          | 0.4448 |
| 0.217         | 27.78 | 22500 | 1.1632          | 0.4565 |
| 0.2115        | 29.63 | 24000 | 1.7441          | 0.4232 |
| 0.1959        | 31.48 | 25500 | 3.4992          | 0.4304 |
| 0.187         | 33.33 | 27000 | 3.6163          | 0.4369 |
| 0.1748        | 35.19 | 28500 | 3.6038          | 0.4467 |
| 0.17          | 37.04 | 30000 | 2.9708          | 0.4362 |
| 0.159         | 38.89 | 31500 | 3.2045          | 0.4279 |
| 0.153         | 40.74 | 33000 | 3.2427          | 0.4287 |
| 0.1463        | 42.59 | 34500 | 3.5439          | 0.4270 |
| 0.139         | 44.44 | 36000 | 3.9381          | 0.4150 |
| 0.1352        | 46.3  | 37500 | 4.1744          | 0.4092 |
| 0.1369        | 48.15 | 39000 | 4.2279          | 0.4154 |
| 0.1273        | 50.0  | 40500 | 4.1691          | 0.4133 |


### Framework versions

- Transformers 4.16.0.dev0
- Pytorch 1.10.1+cu102
- Datasets 1.17.1.dev0
- Tokenizers 0.11.0