File size: 2,522 Bytes
350d4c0 da371ad 350d4c0 da371ad 350d4c0 277e14c 350d4c0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
---
language: multilingual
license: apache-2.0
datasets:
- voxceleb2
libraries:
- speechbrain
tags:
- height-estimation
- speaker-characteristics
- speaker-recognition
- audio-classification
- voice-analysis
---
# Height Estimation Model
This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVR regressor to predict speaker height from audio input. The model was trained on the VoxCeleb2 and evaluated on the VoxCeleb2 and TIMIT datasets.
## Model Details
- Input: Audio file (will be converted to 16kHz, mono, single channel)
- Output: Predicted height in centimeters (continuous value)
- Speaker embedding: 192-dimensional ECAPA-TDNN embedding from SpeechBrain
- Regressor: Support Vector Regression optimized through Optuna
- Performance:
- VoxCeleb2 test set: 6.01 cm Mean Absolute Error (MAE)
- TIMIT test set: 6.02 cm Mean Absolute Error (MAE)
## Training Data
The model was trained on height enriched VoxCeleb2 dataset (for details read the paper):
- Audio preprocessing:
- Converted to WAV format, single channel, 16kHz sampling rate, 256 kp/s bitrate
- Applied SileroVAD for voice activity detection, taking the first voiced segment
## Installation
You can install the package directly from GitHub:
```bash
pip install git+https://github.com/griko/voice-height-regression.git
```
## Usage
```python
from voice_height_regressor import HeightRegressionPipeline
# Load the pipeline
regressor = HeightRegressionPipeline.from_pretrained(
"griko/height_reg_svr_ecapa_voxceleb"
)
# Single file prediction
result = regressor("path/to/audio.wav")
print(f"Predicted height: {result[0]:.1f} cm")
# Batch prediction
results = regressor(["audio1.wav", "audio2.wav"])
print(f"Predicted heights: {[f'{h:.1f}' for h in results]} cm")
```
## Limitations
- Model was trained on celebrity voices from YouTube interviews
- Performance may vary on different audio qualities or recording conditions
- Height predictions are estimates and should not be used for medical or legal purposes
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{koushnir2025vanpyvoiceanalysisframework,
title={VANPY: Voice Analysis Framework},
author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
year={2025},
eprint={2502.17579},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2502.17579},
}
``` |