File size: 2,129 Bytes
65cac9c
 
 
 
 
 
 
 
1252249
65cac9c
 
27a2a24
65cac9c
 
 
1252249
65cac9c
1252249
65cac9c
 
 
1252249
65cac9c
1252249
65cac9c
1252249
 
65cac9c
 
 
 
f11a364
 
65cac9c
 
f11a364
 
 
65cac9c
1252249
65cac9c
 
 
 
 
1252249
309a4fd
 
1252249
 
309a4fd
65cac9c
f11a364
1252249
 
f11a364
 
1252249
 
65cac9c
 
 
 
 
1252249
25c1e4e
 
 
 
 
1252249
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---

language: multilingual
license: apache-2.0
datasets:
- voxceleb2
libraries:
- speechbrain
tags:
- height-estimation
- speaker-characteristics
- speaker-recognition
- audio-classification
- voice-analysis
---


# Height Estimation Model

This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVR regressor to predict speaker height from audio input. The model was trained on the VoxCeleb2 and evaluated on the VoxCeleb2 and TIMIT datasets.

## Model Details
- Input: Audio file (will be converted to 16kHz, mono, single channel)
- Output: Predicted height in centimeters (continuous value)
- Speaker embedding: 192-dimensional ECAPA-TDNN embedding from SpeechBrain
- Regressor: Support Vector Regression optimized through Optuna
- Performance: 
  - VoxCeleb2 test set: 6.01 cm Mean Absolute Error (MAE)
  - TIMIT test set: 6.02 cm Mean Absolute Error (MAE)

## Training Data
The model was trained on VoxCeleb2 dataset:
- Audio preprocessing: 
  - Converted to WAV format, single channel, 16kHz sampling rate, 256 kp/s bitrate
  - Applied SileroVAD for voice activity detection, taking the first voiced segment

## Installation

You can install the package directly from GitHub:

```bash

pip install git+https://github.com/griko/voice-height-regression.git

```

## Usage

```python

from height_regressor import HeightRegressionPipeline



# Load the pipeline

regressor = HeightRegressionPipeline.from_pretrained(

    "griko/height_reg_svr_ecapa_voxceleb"

)



# Single file prediction

result = regressor("path/to/audio.wav")

print(f"Predicted height: {result[0]:.1f} cm")



# Batch prediction

results = regressor(["audio1.wav", "audio2.wav"])

print(f"Predicted heights: {[f'{h:.1f}' for h in results]} cm")

```

## Limitations
- Model was trained on celebrity voices from YouTube interviews
- Performance may vary on different audio qualities or recording conditions
- Height predictions are estimates and should not be used for medical or legal purposes

## Citation
If you use this model in your research, please cite:
```bibtex

TBD

```