Spaces:
Runtime error
Runtime error
Lakoc
commited on
Commit
·
643ae12
1
Parent(s):
8529a0b
Added more info about this space.
Browse files
app.py
CHANGED
|
@@ -153,7 +153,9 @@ yt_transcribe = gr.Interface(
|
|
| 153 |
outputs=["html", "text"],
|
| 154 |
title="Transcribe YouTube",
|
| 155 |
description=(
|
| 156 |
-
"
|
|
|
|
|
|
|
| 157 |
),
|
| 158 |
allow_flagging="never",
|
| 159 |
)
|
|
@@ -161,21 +163,41 @@ yt_transcribe = gr.Interface(
|
|
| 161 |
with demo:
|
| 162 |
gr.TabbedInterface([mf_transcribe, file_transcribe, yt_transcribe], ["Microphone", "Audio file", "YouTube"])
|
| 163 |
|
| 164 |
-
gr.Markdown(
|
| 165 |
-
"Disclaimer: This space currently runs on basic CPU hardware, so generation might take a bit longer. "
|
| 166 |
-
"You can clone the repository and run it locally for better performance. "
|
| 167 |
-
"Please refer to the [Hugging Face documentation](https://huggingface.co/docs/hub/spaces-overview#clone-the-repository) "
|
| 168 |
-
"on how to clone the repository and run it locally. "
|
| 169 |
-
"The model is not perfect and may make errors, so please use responsibly."
|
| 170 |
-
)
|
| 171 |
-
|
| 172 |
gr.Markdown(
|
| 173 |
"""
|
| 174 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 175 |
- [DeCRED Base](https://huggingface.co/BUT-FIT/DeCRED-base)
|
| 176 |
- [DeCRED Small](https://huggingface.co/BUT-FIT/DeCRED-small)
|
| 177 |
- [ED Base](https://huggingface.co/BUT-FIT/ED-base)
|
| 178 |
- [ED Small](https://huggingface.co/BUT-FIT/ED-small)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 179 |
"""
|
| 180 |
)
|
| 181 |
|
|
|
|
| 153 |
outputs=["html", "text"],
|
| 154 |
title="Transcribe YouTube",
|
| 155 |
description=(
|
| 156 |
+
"""
|
| 157 |
+
### *Currently only works on local instances of this space, as youtube-dl does not function from Hugging Face servers.*
|
| 158 |
+
Transcribe long-form YouTube videos with the click of a button! Select a model from the dropdown."""
|
| 159 |
),
|
| 160 |
allow_flagging="never",
|
| 161 |
)
|
|
|
|
| 163 |
with demo:
|
| 164 |
gr.TabbedInterface([mf_transcribe, file_transcribe, yt_transcribe], ["Microphone", "Audio file", "YouTube"])
|
| 165 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
gr.Markdown(
|
| 167 |
"""
|
| 168 |
+
## Overview
|
| 169 |
+
This space demonstrates the performance of **DeCRED** (**De**coder-**C**entric **R**egularization in **E**ncoder-**D**ecoder) for automatic speech recognition (ASR).
|
| 170 |
+
DeCRED enhances model robustness and generalization, particularly in out-of-domain scenarios, by introducing auxiliary classifiers in the decoder layers of encoder-decoder ASR architectures.
|
| 171 |
+
|
| 172 |
+
## Key Features
|
| 173 |
+
- **Auxiliary Classifiers**: DeCRED integrates auxiliary classifiers in the decoder module to regularize training, improving the model’s ability to generalize across domains.
|
| 174 |
+
- **Enhanced Decoding**: It proposes two new decoding strategies that leverage auxiliary classifiers to re-estimate token probabilities, resulting in more accurate ASR predictions.
|
| 175 |
+
- **Strong Baseline**: Built on the **E-branchformer** architecture, DeCRED achieves competitive word error rates (WER) compared to Whisper-medium and OWSM v3 while requiring significantly less training data and a smaller model size.
|
| 176 |
+
- **Out-of-Domain Performance**: DeCRED demonstrates strong generalization, reducing WERs by 2.7 and 2.9 points on the AMI and Gigaspeech datasets, respectively.
|
| 177 |
+
|
| 178 |
+
## Disclaimer
|
| 179 |
+
This space currently runs on basic CPU hardware, so generation might take a bit longer (approximately four times the length of the audio).
|
| 180 |
+
You can clone the repository and run it locally for better performance.
|
| 181 |
+
Please refer to the [Hugging Face documentation](https://huggingface.co/docs/hub/spaces-overview#clone-the-repository)
|
| 182 |
+
for instructions on how to clone the repository and run it locally.
|
| 183 |
+
The model is not perfect and may make errors, so please use it responsibly.
|
| 184 |
+
|
| 185 |
+
## Explore the Models
|
| 186 |
- [DeCRED Base](https://huggingface.co/BUT-FIT/DeCRED-base)
|
| 187 |
- [DeCRED Small](https://huggingface.co/BUT-FIT/DeCRED-small)
|
| 188 |
- [ED Base](https://huggingface.co/BUT-FIT/ED-base)
|
| 189 |
- [ED Small](https://huggingface.co/BUT-FIT/ED-small)
|
| 190 |
+
|
| 191 |
+
## Citation
|
| 192 |
+
If you use DeCRED in your research, please cite the following paper:
|
| 193 |
+
|
| 194 |
+
```bibtex
|
| 195 |
+
@misc{polok_2024_decred,
|
| 196 |
+
title={Improving Automatic Speech Recognition with Decoder-Centric Regularization in Encoder-Decoder Models},
|
| 197 |
+
author={Alexander Polok, Santosh Kesiraju, Karel Beneš, Lukáš Burget, Jan Černocký},
|
| 198 |
+
year={2024},
|
| 199 |
+
}
|
| 200 |
+
```
|
| 201 |
"""
|
| 202 |
)
|
| 203 |
|