|
---
|
|
language: multilingual
|
|
license: apache-2.0
|
|
datasets:
|
|
- voxceleb2
|
|
libraries:
|
|
- speechbrain
|
|
tags:
|
|
- height-estimation
|
|
- speaker-characteristics
|
|
- speaker-recognition
|
|
- audio-classification
|
|
- voice-analysis
|
|
---
|
|
|
|
# Height Estimation Model
|
|
|
|
This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVR regressor to predict speaker height from audio input. The model was trained on the VoxCeleb2 and evaluated on the VoxCeleb2 and TIMIT datasets.
|
|
|
|
## Model Details
|
|
- Input: Audio file (will be converted to 16kHz, mono, single channel)
|
|
- Output: Predicted height in centimeters (continuous value)
|
|
- Speaker embedding: 192-dimensional ECAPA-TDNN embedding from SpeechBrain
|
|
- Regressor: Support Vector Regression optimized through Optuna
|
|
- Performance:
|
|
- VoxCeleb2 test set: 6.01 cm Mean Absolute Error (MAE)
|
|
- TIMIT test set: 6.02 cm Mean Absolute Error (MAE)
|
|
|
|
## Training Data
|
|
The model was trained on height enriched VoxCeleb2 dataset (for details read the paper):
|
|
- Audio preprocessing:
|
|
- Converted to WAV format, single channel, 16kHz sampling rate, 256 kp/s bitrate
|
|
- Applied SileroVAD for voice activity detection, taking the first voiced segment
|
|
|
|
## Installation
|
|
|
|
You can install the package directly from GitHub:
|
|
|
|
```bash
|
|
pip install git+https://github.com/griko/voice-height-regression.git
|
|
```
|
|
|
|
## Usage
|
|
|
|
```python
|
|
from voice_height_regressor import HeightRegressionPipeline
|
|
|
|
# Load the pipeline
|
|
regressor = HeightRegressionPipeline.from_pretrained(
|
|
"griko/height_reg_svr_ecapa_voxceleb"
|
|
)
|
|
|
|
# Single file prediction
|
|
result = regressor("path/to/audio.wav")
|
|
print(f"Predicted height: {result[0]:.1f} cm")
|
|
|
|
# Batch prediction
|
|
results = regressor(["audio1.wav", "audio2.wav"])
|
|
print(f"Predicted heights: {[f'{h:.1f}' for h in results]} cm")
|
|
```
|
|
|
|
## Limitations
|
|
- Model was trained on celebrity voices from YouTube interviews
|
|
- Performance may vary on different audio qualities or recording conditions
|
|
- Height predictions are estimates and should not be used for medical or legal purposes
|
|
|
|
## Citation
|
|
If you use this model in your research, please cite:
|
|
```bibtex
|
|
@misc{koushnir2025vanpyvoiceanalysisframework,
|
|
title={VANPY: Voice Analysis Framework},
|
|
author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
|
|
year={2025},
|
|
eprint={2502.17579},
|
|
archivePrefix={arXiv},
|
|
primaryClass={cs.SD},
|
|
url={https://arxiv.org/abs/2502.17579},
|
|
}
|
|
``` |