Luigi commited on
Commit
9f1d629
·
1 Parent(s): 3e08001

update readme

Browse files
Files changed (1) hide show
  1. README.md +51 -13
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Whisper Vs Sensevoice Small
3
  emoji: ⚡
4
  colorFrom: gray
5
  colorTo: purple
@@ -8,28 +8,66 @@ sdk_version: 5.31.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
- short_description: Compare OpenAI Whisper against Sensevoice Small Resultssssss
12
  ---
13
 
14
  # Whisper vs. FunASR SenseVoice Comparison
15
 
16
- This Space lets you compare OpenAI Whisper variants against FunAudioLLM’s SenseVoice models for automatic speech recognition (ASR), all via a simple Gradio 5 UI.
17
 
18
- ## 🚀 Demo
 
 
 
 
 
 
 
19
 
20
- 1. **Select Whisper model** from the dropdown.
21
- 2. **Select SenseVoice model** from the dropdown.
22
- 3. (Optional) **Toggle punctuation** for SenseVoice.
23
- 4. **Upload** an audio file (wav, mp3, etc.) or **record** with your microphone.
24
- 5. Click **Transcribe** to run both ASRs side-by-side.
 
 
 
 
 
25
 
26
  ## 📁 Files
27
 
28
  - **app.py**
29
- Main Gradio application. Sets up two HF-ASR pipelines and displays their outputs.
30
 
31
  - **requirements.txt**
32
- Python dependencies: Gradio, Transformers, Torch, Torchaudio, Accelerate, ffmpeg-python.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
- - **readme.md**
35
- This documentation.
 
1
  ---
2
+ title: Whisper Vs SenseVoice Small
3
  emoji: ⚡
4
  colorFrom: gray
5
  colorTo: purple
 
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
+ short_description: Compare OpenAI Whisper against FunAudioLLM SenseVoice
12
  ---
13
 
14
  # Whisper vs. FunASR SenseVoice Comparison
15
 
16
+ This Space lets you compare OpenAI Whisper variants against FunAudioLLM’s SenseVoice models for automatic speech recognition (ASR), featuring:
17
 
18
+ - Support for multiple Whisper and SenseVoice models.
19
+ - Language selection for each ASR engine.
20
+ - **Speaker diarization enabled by default** to distinguish speakers in the transcription.
21
+ - **Simplified Chinese to Traditional Chinese conversion enabled by default**.
22
+ - Device selection for Whisper ASR (GPU or CPU).
23
+ - Toggle punctuation output for SenseVoice.
24
+ - Upload audio or record from microphone.
25
+ - Side-by-side transcripts and diarized transcripts display.
26
 
27
+ ## 🚀 How to Use
28
+
29
+ 1. Select a **Whisper model** and its language.
30
+ 2. Choose device for Whisper: GPU or CPU.
31
+ 3. Select a **SenseVoice model** and its language.
32
+ 4. Optionally toggle punctuation for SenseVoice.
33
+ 5. Upload an audio file or record audio via microphone.
34
+ 6. Both ASRs run side-by-side with speaker diarization applied by default.
35
+ 7. Transcripts show both the plain text and speaker-labeled diarized text.
36
+ 8. Transcriptions convert Simplified Chinese to Traditional Chinese automatically.
37
 
38
  ## 📁 Files
39
 
40
  - **app.py**
41
+ The Gradio app code implementing ASR and diarization pipelines with UI.
42
 
43
  - **requirements.txt**
44
+ Lists Python dependencies including Gradio, Transformers, Torch, pyannote.audio, pydub, opencc-python-reimplemented, and others.
45
+
46
+ - **README.md**
47
+ This documentation file.
48
+
49
+ ## ⚠️ Notes
50
+
51
+ - You must set your Hugging Face API token in the Space Secrets as `HF_TOKEN` for diarization to work properly (pyannote models are gated).
52
+ - The app falls back to an older diarization pipeline version if the newer gated version is not accessible.
53
+ - Simplified to Traditional Chinese conversion uses `opencc-python-reimplemented`.
54
+
55
+
56
+ ## 🛠️ Dependencies
57
+
58
+ - Python 3.8+
59
+ - PyTorch
60
+ - Transformers
61
+ - Gradio >= 5.0
62
+ - pyannote.audio (for diarization)
63
+ - pydub (audio processing)
64
+ - opencc-python-reimplemented (Chinese script conversion)
65
+
66
+ ## License
67
+
68
+ MIT License
69
+
70
+ ---
71
+
72
+ If you find issues or want to suggest improvements, please open an issue or a PR.
73