Spaces:

Luigi
/

Whisper-vs-Sensevoice-Small

Runtime error

App Files Files Community

Luigi commited on May 27

Commit

9f1d629

1 Parent(s): 3e08001

update readme

Browse files

Files changed (1) hide show

README.md +51 -13

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Whisper Vs Sensevoice Small
 emoji: ⚡
 colorFrom: gray
 colorTo: purple
@@ -8,28 +8,66 @@ sdk_version: 5.31.0
 app_file: app.py
 pinned: false
 license: mit
-short_description: Compare OpenAI Whisper against Sensevoice Small Resultssssss
 ---
 # Whisper vs. FunASR SenseVoice Comparison
-This Space lets you compare OpenAI Whisper variants against FunAudioLLM’s SenseVoice models for automatic speech recognition (ASR), all via a simple Gradio 5 UI.
-## 🚀 Demo
-1. **Select Whisper model** from the dropdown.
-2. **Select SenseVoice model** from the dropdown.
-3. (Optional) **Toggle punctuation** for SenseVoice.
-4. **Upload** an audio file (wav, mp3, etc.) or **record** with your microphone.
-5. Click **Transcribe** to run both ASRs side-by-side.
 ## 📁 Files
 - **app.py**
-  Main Gradio application. Sets up two HF-ASR pipelines and displays their outputs.
 - **requirements.txt**
-  Python dependencies: Gradio, Transformers, Torch, Torchaudio, Accelerate, ffmpeg-python.
-- **readme.md**
-  This documentation.

 ---
+title: Whisper Vs SenseVoice Small
 emoji: ⚡
 colorFrom: gray
 colorTo: purple
 app_file: app.py
 pinned: false
 license: mit
+short_description: Compare OpenAI Whisper against FunAudioLLM SenseVoice
 ---
 # Whisper vs. FunASR SenseVoice Comparison
+This Space lets you compare OpenAI Whisper variants against FunAudioLLM’s SenseVoice models for automatic speech recognition (ASR), featuring:
+- Support for multiple Whisper and SenseVoice models.
+- Language selection for each ASR engine.
+- **Speaker diarization enabled by default** to distinguish speakers in the transcription.
+- **Simplified Chinese to Traditional Chinese conversion enabled by default**.
+- Device selection for Whisper ASR (GPU or CPU).
+- Toggle punctuation output for SenseVoice.
+- Upload audio or record from microphone.
+- Side-by-side transcripts and diarized transcripts display.
+## 🚀 How to Use
+1. Select a **Whisper model** and its language.
+2. Choose device for Whisper: GPU or CPU.
+3. Select a **SenseVoice model** and its language.
+4. Optionally toggle punctuation for SenseVoice.
+5. Upload an audio file or record audio via microphone.
+6. Both ASRs run side-by-side with speaker diarization applied by default.
+7. Transcripts show both the plain text and speaker-labeled diarized text.
+8. Transcriptions convert Simplified Chinese to Traditional Chinese automatically.
 ## 📁 Files
 - **app.py**
+  The Gradio app code implementing ASR and diarization pipelines with UI.
 - **requirements.txt**
+  Lists Python dependencies including Gradio, Transformers, Torch, pyannote.audio, pydub, opencc-python-reimplemented, and others.
+- **README.md**
+  This documentation file.
+## ⚠️ Notes
+- You must set your Hugging Face API token in the Space Secrets as `HF_TOKEN` for diarization to work properly (pyannote models are gated).
+- The app falls back to an older diarization pipeline version if the newer gated version is not accessible.
+- Simplified to Traditional Chinese conversion uses `opencc-python-reimplemented`.
+## 🛠️ Dependencies
+- Python 3.8+
+- PyTorch
+- Transformers
+- Gradio >= 5.0
+- pyannote.audio (for diarization)
+- pydub (audio processing)
+- opencc-python-reimplemented (Chinese script conversion)
+## License
+MIT License
+---
+If you find issues or want to suggest improvements, please open an issue or a PR.