---
language: multilingual
license: apache-2.0
datasets:
- voxceleb2
libraries:
- speechbrain
tags:
- height-estimation
- speaker-characteristics
- speaker-recognition
- audio-classification
- voice-analysis
---

# Height Estimation Model

This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVR regressor to predict speaker height from audio input. The model was trained on the VoxCeleb2 and evaluated on the VoxCeleb2 and TIMIT datasets.

## Model Details
- Input: Audio file (will be converted to 16kHz, mono, single channel)
- Output: Predicted height in centimeters (continuous value)
- Speaker embedding: 192-dimensional ECAPA-TDNN embedding from SpeechBrain
- Regressor: Support Vector Regression optimized through Optuna
- Performance: 
  - VoxCeleb2 test set: 6.01 cm Mean Absolute Error (MAE)
  - TIMIT test set: 6.02 cm Mean Absolute Error (MAE)

## Training Data
The model was trained on height enriched VoxCeleb2 dataset (for details read the paper):
- Audio preprocessing: 
  - Converted to WAV format, single channel, 16kHz sampling rate, 256 kp/s bitrate
  - Applied SileroVAD for voice activity detection, taking the first voiced segment

## Installation

You can install the package directly from GitHub:

```bash
pip install git+https://github.com/griko/voice-height-regression.git
```

## Usage

```python
from voice_height_regressor import HeightRegressionPipeline

# Load the pipeline
regressor = HeightRegressionPipeline.from_pretrained(
    "griko/height_reg_svr_ecapa_voxceleb"
)

# Single file prediction
result = regressor("path/to/audio.wav")
print(f"Predicted height: {result[0]:.1f} cm")

# Batch prediction
results = regressor(["audio1.wav", "audio2.wav"])
print(f"Predicted heights: {[f'{h:.1f}' for h in results]} cm")
```

## Limitations
- Model was trained on celebrity voices from YouTube interviews
- Performance may vary on different audio qualities or recording conditions
- Height predictions are estimates and should not be used for medical or legal purposes

## Citation
If you use this model in your research, please cite:
```bibtex
@misc{koushnir2025vanpyvoiceanalysisframework,
      title={VANPY: Voice Analysis Framework}, 
      author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
      year={2025},
      eprint={2502.17579},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2502.17579}, 
}
```