File size: 2,522 Bytes
350d4c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
da371ad
350d4c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
da371ad
350d4c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
277e14c
 
 
 
 
 
 
 
 
350d4c0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---

language: multilingual
license: apache-2.0
datasets:
- voxceleb2
libraries:
- speechbrain
tags:
- height-estimation
- speaker-characteristics
- speaker-recognition
- audio-classification
- voice-analysis
---


# Height Estimation Model

This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVR regressor to predict speaker height from audio input. The model was trained on the VoxCeleb2 and evaluated on the VoxCeleb2 and TIMIT datasets.

## Model Details
- Input: Audio file (will be converted to 16kHz, mono, single channel)
- Output: Predicted height in centimeters (continuous value)
- Speaker embedding: 192-dimensional ECAPA-TDNN embedding from SpeechBrain
- Regressor: Support Vector Regression optimized through Optuna
- Performance: 
  - VoxCeleb2 test set: 6.01 cm Mean Absolute Error (MAE)
  - TIMIT test set: 6.02 cm Mean Absolute Error (MAE)

## Training Data
The model was trained on height enriched VoxCeleb2 dataset (for details read the paper):
- Audio preprocessing: 
  - Converted to WAV format, single channel, 16kHz sampling rate, 256 kp/s bitrate
  - Applied SileroVAD for voice activity detection, taking the first voiced segment

## Installation

You can install the package directly from GitHub:

```bash

pip install git+https://github.com/griko/voice-height-regression.git

```

## Usage

```python

from voice_height_regressor import HeightRegressionPipeline



# Load the pipeline

regressor = HeightRegressionPipeline.from_pretrained(

    "griko/height_reg_svr_ecapa_voxceleb"

)



# Single file prediction

result = regressor("path/to/audio.wav")

print(f"Predicted height: {result[0]:.1f} cm")



# Batch prediction

results = regressor(["audio1.wav", "audio2.wav"])

print(f"Predicted heights: {[f'{h:.1f}' for h in results]} cm")

```

## Limitations
- Model was trained on celebrity voices from YouTube interviews
- Performance may vary on different audio qualities or recording conditions
- Height predictions are estimates and should not be used for medical or legal purposes

## Citation
If you use this model in your research, please cite:
```bibtex

@misc{koushnir2025vanpyvoiceanalysisframework,

      title={VANPY: Voice Analysis Framework}, 

      author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},

      year={2025},

      eprint={2502.17579},

      archivePrefix={arXiv},

      primaryClass={cs.SD},

      url={https://arxiv.org/abs/2502.17579}, 

}

```