File size: 2,709 Bytes
65cac9c
 
 
 
 
 
 
 
84a863d
65cac9c
 
27a2a24
65cac9c
 
 
84a863d
65cac9c
84a863d
65cac9c
 
 
84a863d
65cac9c
84a863d
65cac9c
84a863d
 
 
65cac9c
 
 
84a863d
 
 
 
65cac9c
f11a364
 
65cac9c
 
f11a364
 
 
65cac9c
84a863d
65cac9c
 
 
 
 
84a863d
309a4fd
 
84a863d
 
309a4fd
65cac9c
f11a364
84a863d
 
f11a364
 
84a863d
 
65cac9c
 
 
 
 
84a863d
25c1e4e
 
 
 
25f3e5a
 
 
 
 
 
 
 
 
1252249
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---

language: multilingual
license: apache-2.0
datasets:
- voxceleb2
libraries:
- speechbrain
tags:
- gender-classification
- speaker-characteristics
- speaker-recognition
- audio-classification
- voice-analysis
---


# Gender Classification Model

This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVM classifier to predict speaker gender from audio input. The model was trained and evaluated on the VoxCeleb2, Mozilla Common Voice v10.0, and TIMIT datasets

## Model Details
- Input: Audio file (will be converted to 16kHz, mono, single channel)
- Output: Gender prediction ("male" or "female")
- Speaker embedding: 192-dimensional ECAPA-TDNN embedding from SpeechBrain
- Classifier: Support Vector Machine optimized through Optuna (200 trials)
- Performance: 
  - VoxCeleb2 test set: 98.9% accuracy, 0.9885 F1-score
  - Mozilla Common Voice v10.0 English validated test set: 92.3% accuracy
  - TIMIT test set: 99.6% accuracy

## Training Data
The model was trained on VoxCeleb2 dataset:
- Training set: 1,691 speakers (845 females, 846 males)
- Validation set: 785 speakers (396 females, 389 males)
- Test set: 1,647 speakers (828 females, 819 males)
- No speaker overlap between sets
- Audio preprocessing: 
  - Converted to WAV format, single channel, 16kHz sampling rate, 256 kp/s bitrate
  - Applied SileroVAD for voice activity detection, taking the first voiced segment

## Installation

You can install the package directly from GitHub:

```bash

pip install git+https://github.com/griko/voice-gender-classification.git

```

## Usage

```python

from voice_gender_classification import GenderClassificationPipeline



# Load the pipeline

classifier = GenderClassificationPipeline.from_pretrained(

    "griko/gender_cls_svm_ecapa_voxceleb"

)



# Single file prediction

result = classifier("path/to/audio.wav")

print(result)  # ["female"] or ["male"]



# Batch prediction

results = classifier(["audio1.wav", "audio2.wav"])

print(results)  # ["female", "male", "female"]

```

## Limitations
- Model was trained on celebrity voices from YouTube interviews
- Performance may vary on different audio qualities or recording conditions
- Designed for binary gender classification only

## Citation
If you use this model in your research, please cite:
```bibtex

@misc{koushnir2025vanpyvoiceanalysisframework,

      title={VANPY: Voice Analysis Framework}, 

      author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},

      year={2025},

      eprint={2502.17579},

      archivePrefix={arXiv},

      primaryClass={cs.SD},

      url={https://arxiv.org/abs/2502.17579}, 

}

```