Create Readme.md
Browse files
Readme.md
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
This repository contains the model checkpoints related to the paper: [Less is More for Synthetic Speech Detection in the Wild](https://arxiv.org/abs/2502.05674)
|
6 |
+
|
7 |
+
## π₯ Key Features
|
8 |
+
- 3000+ hours of synthetic speech
|
9 |
+
- **Diverse Distribution Shifts**: The dataset spans **7 key distribution shifts**, including:
|
10 |
+
- π **Reading Style**
|
11 |
+
- ποΈ **Podcast**
|
12 |
+
- π₯ **YouTube**
|
13 |
+
- π£οΈ **Languages (Three different languages)**
|
14 |
+
- π **Demographics (including variations in age, accent, and gender)**
|
15 |
+
- **Multiple Speech Generation Systems**: Includes data synthesized from various **TTS models** and **vocoders**.
|
16 |
+
|
17 |
+
## π‘ Why We Built This Dataset
|
18 |
+
> Driven by advances in self-supervised learning for speech, state-of-the-art synthetic speech detectors have achieved low error rates on popular benchmarks such as ASVspoof. However, prior benchmarks do not address the wide range of real-world variability in speech. Are reported error rates realistic in real-world conditions? To assess detector failure modes and robustness under controlled distribution shifts, we introduce **ShiftySpeech**, a benchmark with more than 3000 hours of synthetic speech from 7 domains, 6 TTS systems, 12 vocoders, and 3 languages.
|
19 |
+
>
|
20 |
+
> π **Stay tuned! More model checkpoints will be available soon.**
|