--- license: mit language: - de metrics: - bleu - wer base_model: - openai/whisper-large-v3-turbo pipeline_tag: automatic-speech-recognition library_name: transformers --- # Whisper Large V3 Turbo (Swiss German Fine-Tuned with QLoRa) This repository contains a fine-tuned version of OpenAI's Whisper Large V3 Turbo model, adapted specifically for Swiss German dialects using QLoRa optimization. The model achieves state-of-the-art performance for Swiss German automatic speech recognition (ASR). ## Model Summary - **Base Model**: Whisper Large V3 Turbo - **Fine-Tuning Method**: QLoRa (8-bit precision) - **Rank**: 200 - **Alpha**: 16 - **Hardware**: 2x NVIDIA A100 80GB GPUs - **Training Time**: 140 hours ## Performance Metrics - **Word Error Rate (WER)**: **17.5%** - **BLEU Score**: **65.0** The model's performance has been evaluated across multiple datasets representing diverse dialectal and demographic distributions in Swiss German. ### Dataset Summary The model has been trained and evaluated on a comprehensive suite of Swiss German datasets: 1. **SDS-200 Corpus** - **Size**: 200 hours - **Description**: A corpus covering all Swiss German dialects. 2. **STT4SG-350** - **Size**: 343 hours - **Description**: Balanced distribution across Swiss German dialects and demographics, including gender representation. - **[Dataset Link](https://swissnlp.org/home/activities/datasets/)** 3. **SwissDial-Zh v1.1** - **Size**: 24 hours - **Description**: A dataset with balanced representation of Swiss German dialects. - **[Dataset Link](https://mtc.ethz.ch/publications/open-source/swiss-dial.html)** 4. **Swiss Parliament Corpus V2 (SPC)** - **Size**: 293 hours - **Description**: Parliament recordings across Swiss German dialects. - **[Dataset Link](https://www.cs.technik.fhnw.ch/i4ds-datasets)** 5. **ASGDTS (All Swiss German Dialects Test Set)** - **Size**: 13 hours - **Description**: A stratified dataset closely resembling real-world Swiss German dialect distribution. - **[Dataset Link](https://www.cs.technik.fhnw.ch/i4ds-datasets)** ## Results Across Datasets ### WER Scores | **Model** | **WER (All)** | **WER SD (All)** | |---------------------------|----------------|--------------------| | Turbo V3 Swiss German | **0.1672** | **0.1754** | | Large V3 | 0.2884 | 0.2829 | | Turbo V3 | 0.4392 | 0.2777 | ### BLEU Scores | **Model** | **BLEU (All)** | **BLEU SD (All)** | |---------------------------|----------------|--------------------| | Turbo V3 Swiss German | **0.65** | **0.3149** | | Large V3 | 0.5345 | 0.3453 | | Turbo V3 | 0.3367 | 0.2975 | ## Visual Results ### WER and BLEU Scores Across Datasets ![General Results](./general_results.png) ### WER Scores Across Datasets ![WER Scores](./wer_scores.png) ### BLEU Scores Across Datasets ![BLEU Scores](./bleu_scores.png) ## Usage This model can be used directly with the Hugging Face Transformers library for tasks requiring Swiss German ASR. ## Acknowledgments Special thanks to the creators and maintainers of the datasets used in this work: - [Swiss NLP](https://swissnlp.org/home/activities/datasets/) - [ETH Zurich](https://mtc.ethz.ch/publications/open-source/swiss-dial.html) - [FHNW](https://www.cs.technik.fhnw.ch/i4ds-datasets) And to the [University of Geneva](https://unige.ch) for allowing us access to their High Performance Computing cluster on which the model has been trained. ## Citation If you use this model in your work, please cite this repository as follows: ```bibtex @misc{whisper-large-v3-turbo-swissgerman, author = {Nizar Michaud}, title = {Whisper Large V3 Turbo Fine-Tuned for Swiss German}, year = {2024}, publisher = {Hugging Face}, url = {https://huggingface.co/nizarmichaud/whisper-large-v3-turbo-swissgerman}, doi = 10.57967/hf/3858, }