PereLluis13
/

wav2vec2-large-xlsr-53-greek

Automatic Speech Recognition

xlsr-fine-tuning-week

Inference Endpoints

Model card Files Files and versions Community

PereLluis13 commited on Mar 24, 2021

Commit

56de3dd

·

1 Parent(s): ed38cca

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -87,7 +87,7 @@ processor = Wav2Vec2Processor.from_pretrained("PereLluis13/wav2vec2-large-xlsr-5
 model = Wav2Vec2ForCTC.from_pretrained("PereLluis13/wav2vec2-large-xlsr-53-greek")
 model.to("cuda")
-chars_to_ignore_regex = '[\\\\,\\\\?\\\\.\\\\!\\\\-\\\\;\\\\:\\\\"\\\\“\\\\%\\\\‘\\\\”\\\\�]'
 resampler = torchaudio.transforms.Resample(48_000, 16_000)
@@ -123,7 +123,7 @@ print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"],
 ## Training
-The Common Voice `train`, `validation`, and CSS10 datasets were used for training, added as `extra` split to the dataset. The sampling rate and format of the CSS10 files is different, hence the function `speech_file_to_array_fn` was changed to: # TODO: adapt to state all the datasets that were used for training.
 ```
     def speech_file_to_array_fn(batch):
         try:
@@ -139,4 +139,4 @@ The Common Voice `train`, `validation`, and CSS10 datasets were used for trainin
 As suggested by Florian Zimmermeister.
-The script used for training can be found in [run_common_voice.py](examples/research_projects/wav2vec2/run_common_voice.py), still pending of PR. The only changes are to `speech_file_to_array_fn`. Batch size was kept at 32 (using `gradient_accumulation_steps`) using one of the [OVH](https://www.ovh.com/) machines, with a V100 GPU (thank you very much [OVH](https://www.ovh.com/)). The model trained for 40 epochs, the first 20 with the `train+validation` splits, and then `extra` split was added with the data from CSS10 at the 20th epoch. # TODO: fill in a link to your training script here. If you trained your model in a colab, simply fill in the link here. If you trained the model locally, it would be great if you could upload the training script on github and paste the link here.

 model = Wav2Vec2ForCTC.from_pretrained("PereLluis13/wav2vec2-large-xlsr-53-greek")
 model.to("cuda")
+chars_to_ignore_regex = '[\\\\\\\\,\\\\\\\\?\\\\\\\\.\\\\\\\\!\\\\\\\\-\\\\\\\\;\\\\\\\\:\\\\\\\\"\\\\\\\\“\\\\\\\\%\\\\\\\\‘\\\\\\\\”\\\\\\\\�]'
 resampler = torchaudio.transforms.Resample(48_000, 16_000)
 ## Training
+The Common Voice `train`, `validation`, and CSS10 datasets were used for training, added as `extra` split to the dataset. The sampling rate and format of the CSS10 files is different, hence the function `speech_file_to_array_fn` was changed to:
 ```
     def speech_file_to_array_fn(batch):
         try:
 As suggested by Florian Zimmermeister.
+The script used for training can be found in [run_common_voice.py](examples/research_projects/wav2vec2/run_common_voice.py), still pending of PR. The only changes are to `speech_file_to_array_fn`. Batch size was kept at 32 (using `gradient_accumulation_steps`) using one of the [OVH](https://www.ovh.com/) machines, with a V100 GPU (thank you very much [OVH](https://www.ovh.com/)). The model trained for 40 epochs, the first 20 with the `train+validation` splits, and then `extra` split was added with the data from CSS10 at the 20th epoch.