griko's picture
Upload folder using huggingface_hub
277e14c verified
---
language: multilingual
license: apache-2.0
datasets:
- voxceleb2
libraries:
- speechbrain
tags:
- height-estimation
- speaker-characteristics
- speaker-recognition
- audio-classification
- voice-analysis
---
# Height Estimation Model
This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVR regressor to predict speaker height from audio input. The model was trained on the VoxCeleb2 and evaluated on the VoxCeleb2 and TIMIT datasets.
## Model Details
- Input: Audio file (will be converted to 16kHz, mono, single channel)
- Output: Predicted height in centimeters (continuous value)
- Speaker embedding: 192-dimensional ECAPA-TDNN embedding from SpeechBrain
- Regressor: Support Vector Regression optimized through Optuna
- Performance:
- VoxCeleb2 test set: 6.01 cm Mean Absolute Error (MAE)
- TIMIT test set: 6.02 cm Mean Absolute Error (MAE)
## Training Data
The model was trained on height enriched VoxCeleb2 dataset (for details read the paper):
- Audio preprocessing:
- Converted to WAV format, single channel, 16kHz sampling rate, 256 kp/s bitrate
- Applied SileroVAD for voice activity detection, taking the first voiced segment
## Installation
You can install the package directly from GitHub:
```bash
pip install git+https://github.com/griko/voice-height-regression.git
```
## Usage
```python
from voice_height_regressor import HeightRegressionPipeline
# Load the pipeline
regressor = HeightRegressionPipeline.from_pretrained(
"griko/height_reg_svr_ecapa_voxceleb"
)
# Single file prediction
result = regressor("path/to/audio.wav")
print(f"Predicted height: {result[0]:.1f} cm")
# Batch prediction
results = regressor(["audio1.wav", "audio2.wav"])
print(f"Predicted heights: {[f'{h:.1f}' for h in results]} cm")
```
## Limitations
- Model was trained on celebrity voices from YouTube interviews
- Performance may vary on different audio qualities or recording conditions
- Height predictions are estimates and should not be used for medical or legal purposes
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{koushnir2025vanpyvoiceanalysisframework,
title={VANPY: Voice Analysis Framework},
author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
year={2025},
eprint={2502.17579},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2502.17579},
}
```