File size: 4,154 Bytes
44da614
 
 
 
 
 
 
 
 
 
 
 
10cf487
 
 
 
 
 
44da614
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2ed4aae
878f8f5
44da614
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
license: mit
language:
- de
metrics:
- bleu
- wer
base_model:
- openai/whisper-large-v3-turbo
pipeline_tag: automatic-speech-recognition
library_name: transformers
---

# SCRUBBED REPOSITORY
# MODEL TAKEN DOWN

Due to some datasets' licenses the model had to be taken down.

# Whisper Large V3 Turbo (Swiss German Fine-Tuned with QLoRa)

This repository contains a fine-tuned version of OpenAI's Whisper Large V3 Turbo model, adapted specifically for Swiss German dialects using QLoRa optimization. The model achieves state-of-the-art performance for Swiss German automatic speech recognition (ASR).

## Model Summary

- **Base Model**: Whisper Large V3 Turbo
- **Fine-Tuning Method**: QLoRa (8-bit precision)
  - **Rank**: 200
  - **Alpha**: 16
- **Hardware**: 2x NVIDIA A100 80GB GPUs
- **Training Time**: 140 hours

## Performance Metrics

- **Word Error Rate (WER)**: **17.5%**
- **BLEU Score**: **65.0**

The model's performance has been evaluated across multiple datasets representing diverse dialectal and demographic distributions in Swiss German.

### Dataset Summary

The model has been trained and evaluated on a comprehensive suite of Swiss German datasets:

1. **SDS-200 Corpus**
   - **Size**: 200 hours
   - **Description**: A corpus covering all Swiss German dialects.

2. **STT4SG-350**
   - **Size**: 343 hours
   - **Description**: Balanced distribution across Swiss German dialects and demographics, including gender representation.
   - **[Dataset Link](https://swissnlp.org/home/activities/datasets/)**

3. **SwissDial-Zh v1.1**
   - **Size**: 24 hours
   - **Description**: A dataset with balanced representation of Swiss German dialects.
   - **[Dataset Link](https://mtc.ethz.ch/publications/open-source/swiss-dial.html)**

4. **Swiss Parliament Corpus V2 (SPC)**
   - **Size**: 293 hours
   - **Description**: Parliament recordings across Swiss German dialects.
   - **[Dataset Link](https://www.cs.technik.fhnw.ch/i4ds-datasets)**

5. **ASGDTS (All Swiss German Dialects Test Set)**
   - **Size**: 13 hours
   - **Description**: A stratified dataset closely resembling real-world Swiss German dialect distribution.
   - **[Dataset Link](https://www.cs.technik.fhnw.ch/i4ds-datasets)**

## Results Across Datasets

### WER Scores

| **Model**                | **WER (All)**  | **WER SD (All)**  |
|---------------------------|----------------|--------------------|
| Turbo V3 Swiss German     | **0.1672**     | **0.1754**         |
| Large V3                 | 0.2884         | 0.2829            |
| Turbo V3                 | 0.4392         | 0.2777            |


### BLEU Scores

| **Model**                | **BLEU (All)** | **BLEU SD (All)** |
|---------------------------|----------------|--------------------|
| Turbo V3 Swiss German     | **0.65**       | **0.3149**         |
| Large V3                 | 0.5345         | 0.3453            |
| Turbo V3                 | 0.3367         | 0.2975            |


## Visual Results

### WER and BLEU Scores Across Datasets

![General Results](./general_results.png)

### WER Scores Across Datasets

![WER Scores](./wer_scores.png)

### BLEU Scores Across Datasets

![BLEU Scores](./bleu_scores.png)

## Usage

This model can be used directly with the Hugging Face Transformers library for tasks requiring Swiss German ASR.

## Acknowledgments

Special thanks to the creators and maintainers of the datasets used in this work:
- [Swiss NLP](https://swissnlp.org/home/activities/datasets/)
- [ETH Zurich](https://mtc.ethz.ch/publications/open-source/swiss-dial.html)
- [FHNW](https://www.cs.technik.fhnw.ch/i4ds-datasets)

And to the [University of Geneva](https://unige.ch) for allowing us access to their High Performance Computing cluster on which the model has been trained.

## Citation

If you use this model in your work, please cite this repository as follows:

```bibtex
@misc{whisper-large-v3-turbo-swissgerman,
  author = {Nizar Michaud},
  title = {Whisper Large V3 Turbo Fine-Tuned for Swiss German},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/nizarmichaud/whisper-large-v3-turbo-swissgerman},
  doi = 10.57967/hf/3858,
}