File size: 1,441 Bytes
e16e98b
 
 
 
 
 
cad4e86
 
e16e98b
 
 
 
 
 
 
 
 
 
 
 
 
e4cd9ac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
---
license: apache-2.0
---

This repository contains the model checkpoints related to the paper: [Less is More for Synthetic Speech Detection in the Wild](https://arxiv.org/abs/2502.05674)

Dataset can be downloaded from [here](https://huggingface.co/datasets/ash56/ShiftySpeech/tree/main)

## πŸ”₯ Key Features
- 3000+ hours of synthetic speech
- **Diverse Distribution Shifts**: The dataset spans **7 key distribution shifts**, including:  
  - πŸ“– **Reading Style**  
  - πŸŽ™οΈ **Podcast**  
  - πŸŽ₯ **YouTube**  
  - πŸ—£οΈ **Languages (Three different languages)**  
  - 🌎 **Demographics (including variations in age, accent, and gender)**  
- **Multiple Speech Generation Systems**: Includes data synthesized from various **TTS models** and **vocoders**.
  
## πŸ’‘ Why We Built This Dataset
> Driven by advances in self-supervised learning for speech, state-of-the-art synthetic speech detectors have achieved low error rates on popular benchmarks such as ASVspoof. However, prior benchmarks do not address the wide range of real-world variability in speech. Are reported error rates realistic in real-world conditions? To assess detector failure modes and robustness under controlled distribution shifts, we introduce **ShiftySpeech**, a benchmark with more than 3000 hours of synthetic speech from 7 domains, 6 TTS systems, 12 vocoders, and 3 languages.
>
πŸš€ **Stay tuned! More model checkpoints will be available soon.**