Wav2Vec2-Base with audio augmentation

The base model pretrained on 16kHz sampled speech-augmented audio. The audio comes from 960h Libris dataset that is augmented as follows:

The ambient noise dataset includes MUSAN and WHAM (a total of 189 hours, including music, speech, and environmental noise). The reverb dataset is from Room RIR and BUT Speech@FIT (2650 room impulse response signals).

Model Parameters License

The model parameters are made available for non-commercial use only under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You can find details at: https://creativecommons.org/licenses/by-nc/4.0/legalcode

Contact

[email protected]

Follow

Downloads last month
12
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Dataset used to train nguyenvulebinh/wav2vec2-noisy