Commit
·
56de3dd
1
Parent(s):
ed38cca
Update README.md
Browse files
README.md
CHANGED
@@ -87,7 +87,7 @@ processor = Wav2Vec2Processor.from_pretrained("PereLluis13/wav2vec2-large-xlsr-5
|
|
87 |
model = Wav2Vec2ForCTC.from_pretrained("PereLluis13/wav2vec2-large-xlsr-53-greek")
|
88 |
model.to("cuda")
|
89 |
|
90 |
-
chars_to_ignore_regex = '[
|
91 |
|
92 |
|
93 |
resampler = torchaudio.transforms.Resample(48_000, 16_000)
|
@@ -123,7 +123,7 @@ print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"],
|
|
123 |
|
124 |
## Training
|
125 |
|
126 |
-
The Common Voice `train`, `validation`, and CSS10 datasets were used for training, added as `extra` split to the dataset. The sampling rate and format of the CSS10 files is different, hence the function `speech_file_to_array_fn` was changed to:
|
127 |
```
|
128 |
def speech_file_to_array_fn(batch):
|
129 |
try:
|
@@ -139,4 +139,4 @@ The Common Voice `train`, `validation`, and CSS10 datasets were used for trainin
|
|
139 |
|
140 |
As suggested by Florian Zimmermeister.
|
141 |
|
142 |
-
The script used for training can be found in [run_common_voice.py](examples/research_projects/wav2vec2/run_common_voice.py), still pending of PR. The only changes are to `speech_file_to_array_fn`. Batch size was kept at 32 (using `gradient_accumulation_steps`) using one of the [OVH](https://www.ovh.com/) machines, with a V100 GPU (thank you very much [OVH](https://www.ovh.com/)). The model trained for 40 epochs, the first 20 with the `train+validation` splits, and then `extra` split was added with the data from CSS10 at the 20th epoch.
|
|
|
87 |
model = Wav2Vec2ForCTC.from_pretrained("PereLluis13/wav2vec2-large-xlsr-53-greek")
|
88 |
model.to("cuda")
|
89 |
|
90 |
+
chars_to_ignore_regex = '[\\\\\\\\,\\\\\\\\?\\\\\\\\.\\\\\\\\!\\\\\\\\-\\\\\\\\;\\\\\\\\:\\\\\\\\"\\\\\\\\“\\\\\\\\%\\\\\\\\‘\\\\\\\\”\\\\\\\\�]'
|
91 |
|
92 |
|
93 |
resampler = torchaudio.transforms.Resample(48_000, 16_000)
|
|
|
123 |
|
124 |
## Training
|
125 |
|
126 |
+
The Common Voice `train`, `validation`, and CSS10 datasets were used for training, added as `extra` split to the dataset. The sampling rate and format of the CSS10 files is different, hence the function `speech_file_to_array_fn` was changed to:
|
127 |
```
|
128 |
def speech_file_to_array_fn(batch):
|
129 |
try:
|
|
|
139 |
|
140 |
As suggested by Florian Zimmermeister.
|
141 |
|
142 |
+
The script used for training can be found in [run_common_voice.py](examples/research_projects/wav2vec2/run_common_voice.py), still pending of PR. The only changes are to `speech_file_to_array_fn`. Batch size was kept at 32 (using `gradient_accumulation_steps`) using one of the [OVH](https://www.ovh.com/) machines, with a V100 GPU (thank you very much [OVH](https://www.ovh.com/)). The model trained for 40 epochs, the first 20 with the `train+validation` splits, and then `extra` split was added with the data from CSS10 at the 20th epoch.
|