Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,7 @@
|
|
1 |
-
<======================Copy **raw** version from here=========================
|
2 |
---
|
3 |
-
language:
|
4 |
datasets:
|
5 |
-
- common_voice
|
6 |
-
- TODO: add more datasets if you have used additional datasets. Make sure to use the exact same
|
7 |
-
dataset name as the one found [here](https://huggingface.co/datasets). If the dataset can not be found in the official datasets, just give it a new name
|
8 |
metrics:
|
9 |
- wer
|
10 |
tags:
|
@@ -14,24 +11,24 @@ tags:
|
|
14 |
- xlsr-fine-tuning-week
|
15 |
license: apache-2.0
|
16 |
model-index:
|
17 |
-
- name:
|
18 |
results:
|
19 |
- task:
|
20 |
name: Speech Recognition
|
21 |
type: automatic-speech-recognition
|
22 |
dataset:
|
23 |
-
name: Common Voice
|
24 |
type: common_voice
|
25 |
-
args:
|
26 |
metrics:
|
27 |
- name: Test WER
|
28 |
type: wer
|
29 |
value: {wer_result_on_test} #TODO (IMPORTANT): replace {wer_result_on_test} with the WER error rate you achieved on the common_voice test set. It should be in the format XX.XX (don't add the % sign here). **Please** remember to fill out this value after you evaluated your model, so that your model appears on the leaderboard. If you fill out this model card before evaluating your model, please remember to edit the model card afterward to fill in your value
|
30 |
---
|
31 |
|
32 |
-
# Wav2Vec2-Large-XLSR-53-
|
33 |
|
34 |
-
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on
|
35 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
36 |
|
37 |
## Usage
|
@@ -44,10 +41,10 @@ import torchaudio
|
|
44 |
from datasets import load_dataset
|
45 |
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
|
46 |
|
47 |
-
test_dataset = load_dataset("common_voice", "
|
48 |
|
49 |
-
processor = Wav2Vec2Processor.from_pretrained("
|
50 |
-
model = Wav2Vec2ForCTC.from_pretrained("
|
51 |
|
52 |
resampler = torchaudio.transforms.Resample(48_000, 16_000)
|
53 |
|
@@ -73,7 +70,7 @@ print("Reference:", test_dataset["sentence"][:2])
|
|
73 |
|
74 |
## Evaluation
|
75 |
|
76 |
-
The model can be evaluated as follows on the
|
77 |
|
78 |
|
79 |
```python
|
@@ -83,14 +80,14 @@ from datasets import load_dataset, load_metric
|
|
83 |
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
|
84 |
import re
|
85 |
|
86 |
-
test_dataset = load_dataset("common_voice", "
|
87 |
wer = load_metric("wer")
|
88 |
|
89 |
-
processor = Wav2Vec2Processor.from_pretrained("
|
90 |
-
model = Wav2Vec2ForCTC.from_pretrained("
|
91 |
model.to("cuda")
|
92 |
|
93 |
-
chars_to_ignore_regex = '[
|
94 |
resampler = torchaudio.transforms.Resample(48_000, 16_000)
|
95 |
|
96 |
# Preprocessing the datasets.
|
@@ -125,8 +122,6 @@ print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"],
|
|
125 |
|
126 |
## Training
|
127 |
|
128 |
-
The Common Voice `train`, `validation
|
129 |
|
130 |
-
The script used for training can be found [here](
|
131 |
-
|
132 |
-
=======================To here===============================>
|
|
|
|
|
1 |
---
|
2 |
+
language: pl
|
3 |
datasets:
|
4 |
+
- common_voice
|
|
|
|
|
5 |
metrics:
|
6 |
- wer
|
7 |
tags:
|
|
|
11 |
- xlsr-fine-tuning-week
|
12 |
license: apache-2.0
|
13 |
model-index:
|
14 |
+
- name: mbien/wav2vec2-large-xlsr-polish
|
15 |
results:
|
16 |
- task:
|
17 |
name: Speech Recognition
|
18 |
type: automatic-speech-recognition
|
19 |
dataset:
|
20 |
+
name: Common Voice pl
|
21 |
type: common_voice
|
22 |
+
args: pl
|
23 |
metrics:
|
24 |
- name: Test WER
|
25 |
type: wer
|
26 |
value: {wer_result_on_test} #TODO (IMPORTANT): replace {wer_result_on_test} with the WER error rate you achieved on the common_voice test set. It should be in the format XX.XX (don't add the % sign here). **Please** remember to fill out this value after you evaluated your model, so that your model appears on the leaderboard. If you fill out this model card before evaluating your model, please remember to edit the model card afterward to fill in your value
|
27 |
---
|
28 |
|
29 |
+
# Wav2Vec2-Large-XLSR-53-Polish
|
30 |
|
31 |
+
Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Polish using the [Common Voice](https://huggingface.co/datasets/common_voice) dataset.
|
32 |
When using this model, make sure that your speech input is sampled at 16kHz.
|
33 |
|
34 |
## Usage
|
|
|
41 |
from datasets import load_dataset
|
42 |
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
|
43 |
|
44 |
+
test_dataset = load_dataset("common_voice", "pl", split="test[:2%]")
|
45 |
|
46 |
+
processor = Wav2Vec2Processor.from_pretrained("mbien/wav2vec2-large-xlsr-polish")
|
47 |
+
model = Wav2Vec2ForCTC.from_pretrained("mbien/wav2vec2-large-xlsr-polish")
|
48 |
|
49 |
resampler = torchaudio.transforms.Resample(48_000, 16_000)
|
50 |
|
|
|
70 |
|
71 |
## Evaluation
|
72 |
|
73 |
+
The model can be evaluated as follows on the Polish test data of Common Voice.
|
74 |
|
75 |
|
76 |
```python
|
|
|
80 |
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
|
81 |
import re
|
82 |
|
83 |
+
test_dataset = load_dataset("common_voice", "pl", split="test")
|
84 |
wer = load_metric("wer")
|
85 |
|
86 |
+
processor = Wav2Vec2Processor.from_pretrained("mbien/wav2vec2-large-xlsr-polish")
|
87 |
+
model = Wav2Vec2ForCTC.from_pretrained("mbien/wav2vec2-large-xlsr-polish")
|
88 |
model.to("cuda")
|
89 |
|
90 |
+
chars_to_ignore_regex = '[\—\…\,\?\.\!\-\;\:\"\“\„\%\‘\”\�\«\»\'\’]'
|
91 |
resampler = torchaudio.transforms.Resample(48_000, 16_000)
|
92 |
|
93 |
# Preprocessing the datasets.
|
|
|
122 |
|
123 |
## Training
|
124 |
|
125 |
+
The Common Voice `train`, `validation` datasets were used for training.
|
126 |
|
127 |
+
The script used for training can be found [here](https://colab.research.google.com/drive/1DvrFMoKp9h3zk_eXrJF2s4_TGDHh0tMc?usp=sharing)
|
|
|
|