added bsize for open-asr eval
Browse files
README.md
CHANGED
|
@@ -383,7 +383,7 @@ canary-180m-flash <br>
|
|
| 383 |
## Training Dataset:
|
| 384 |
|
| 385 |
The canary-180m-flash model is trained on a total of 85K hrs of speech data. It consists of 31K hrs of public data, 20K hrs collected by [Suno](https://suno.ai/), and 34K hrs of in-house data.
|
| 386 |
-
The datasets below include conversations, videos from the web and audiobook recordings.
|
| 387 |
|
| 388 |
**Data Collection Method:**
|
| 389 |
* Human <br>
|
|
@@ -476,7 +476,7 @@ In both ASR and AST experiments, predictions were generated using beam search wi
|
|
| 476 |
|
| 477 |
The ASR performance is measured with word error rate (WER), and we process the groundtruth and predicted text with [whisper-normalizer](https://pypi.org/project/whisper-normalizer/).
|
| 478 |
|
| 479 |
-
WER on [HuggingFace OpenASR leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard):
|
| 480 |
|
| 481 |
| **Version** | **Model** | **RTFx** | **AMI** | **GigaSpeech** | **LS Clean** | **LS Other** | **Earnings22** | **SPGISpech** | **Tedlium** | **Voxpopuli** |
|
| 482 |
|:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
|
|
|
|
| 383 |
## Training Dataset:
|
| 384 |
|
| 385 |
The canary-180m-flash model is trained on a total of 85K hrs of speech data. It consists of 31K hrs of public data, 20K hrs collected by [Suno](https://suno.ai/), and 34K hrs of in-house data.
|
| 386 |
+
The datasets below include conversations, videos from the web, and audiobook recordings.
|
| 387 |
|
| 388 |
**Data Collection Method:**
|
| 389 |
* Human <br>
|
|
|
|
| 476 |
|
| 477 |
The ASR performance is measured with word error rate (WER), and we process the groundtruth and predicted text with [whisper-normalizer](https://pypi.org/project/whisper-normalizer/).
|
| 478 |
|
| 479 |
+
WER on [HuggingFace OpenASR leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard) evaluated with a batch size of 128:
|
| 480 |
|
| 481 |
| **Version** | **Model** | **RTFx** | **AMI** | **GigaSpeech** | **LS Clean** | **LS Other** | **Earnings22** | **SPGISpech** | **Tedlium** | **Voxpopuli** |
|
| 482 |
|:---------:|:-----------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
|