Commit
·
910d5d8
1
Parent(s):
a77b1eb
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,31 @@
|
|
1 |
---
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- multilingual
|
4 |
+
- ain
|
5 |
license: apache-2.0
|
6 |
---
|
7 |
+
|
8 |
+
## Wav2Vec2-Large-XLSR-53 fine-tuned for automatic transcription of Sakhalin Ainu
|
9 |
+
|
10 |
+
This is a [wav2vec-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) model after continued pretraining on speech data in Hokkaido Ainu and Sakhalin Ainu (see [wav2vec2-large-xlsr-53-pretrain-ain](https://huggingface.co/karolnowakowski/wav2vec2-large-xlsr-53-pretrain-ain)) and fine-tuning for automatic speech recognition on 10h of labeled Sakhalin Ainu data.
|
11 |
+
For details, please refer to the [paper](https://arxiv.org/abs/2301.07295).
|
12 |
+
|
13 |
+
|
14 |
+
## Citation
|
15 |
+
When using the model please cite the following paper:
|
16 |
+
```bibtex
|
17 |
+
@article{NOWAKOWSKI2023103148,
|
18 |
+
title = {Adapting multilingual speech representation model for a new, underresourced language through multilingual fine-tuning and continued pretraining},
|
19 |
+
journal = {Information Processing & Management},
|
20 |
+
volume = {60},
|
21 |
+
number = {2},
|
22 |
+
pages = {103148},
|
23 |
+
year = {2023},
|
24 |
+
issn = {0306-4573},
|
25 |
+
doi = {https://doi.org/10.1016/j.ipm.2022.103148},
|
26 |
+
url = {https://www.sciencedirect.com/science/article/pii/S0306457322002497},
|
27 |
+
author = {Karol Nowakowski and Michal Ptaszynski and Kyoko Murasaki and Jagna Nieuważny},
|
28 |
+
keywords = {Automatic speech transcription, ASR, Wav2vec 2.0, Pretrained transformer models, Speech representation models, Cross-lingual transfer, Language documentation, Endangered languages, Underresourced languages, Sakhalin Ainu},
|
29 |
+
abstract = {In recent years, neural models learned through self-supervised pretraining on large scale multilingual text or speech data have exhibited promising results for underresourced languages, especially when a relatively large amount of data from related language(s) is available. While the technology has a potential for facilitating tasks carried out in language documentation projects, such as speech transcription, pretraining a multilingual model from scratch for every new language would be highly impractical. We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language, focusing on actual fieldwork data from a critically endangered tongue: Ainu. Specifically, we (i) examine the feasibility of leveraging data from similar languages also in fine-tuning; (ii) verify whether the model’s performance can be improved by further pretraining on target language data. Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language and leads to considerable reduction in error rates. Furthermore, we find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance when there is very little labeled data in the target language.}
|
30 |
+
}
|
31 |
+
```
|