--- language: multilingual license: apache-2.0 datasets: - voxceleb2 libraries: - speechbrain tags: - height-estimation - speaker-characteristics - speaker-recognition - audio-classification - voice-analysis --- # Height Estimation Model This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVR regressor to predict speaker height from audio input. The model was trained on the VoxCeleb2 and evaluated on the VoxCeleb2 and TIMIT datasets. ## Model Details - Input: Audio file (will be converted to 16kHz, mono, single channel) - Output: Predicted height in centimeters (continuous value) - Speaker embedding: 192-dimensional ECAPA-TDNN embedding from SpeechBrain - Regressor: Support Vector Regression optimized through Optuna - Performance: - VoxCeleb2 test set: 6.01 cm Mean Absolute Error (MAE) - TIMIT test set: 6.02 cm Mean Absolute Error (MAE) ## Training Data The model was trained on height enriched VoxCeleb2 dataset (for details read the paper): - Audio preprocessing: - Converted to WAV format, single channel, 16kHz sampling rate, 256 kp/s bitrate - Applied SileroVAD for voice activity detection, taking the first voiced segment ## Installation You can install the package directly from GitHub: ```bash pip install git+https://github.com/griko/voice-height-regression.git ``` ## Usage ```python from voice_height_regressor import HeightRegressionPipeline # Load the pipeline regressor = HeightRegressionPipeline.from_pretrained( "griko/height_reg_svr_ecapa_voxceleb" ) # Single file prediction result = regressor("path/to/audio.wav") print(f"Predicted height: {result[0]:.1f} cm") # Batch prediction results = regressor(["audio1.wav", "audio2.wav"]) print(f"Predicted heights: {[f'{h:.1f}' for h in results]} cm") ``` ## Limitations - Model was trained on celebrity voices from YouTube interviews - Performance may vary on different audio qualities or recording conditions - Height predictions are estimates and should not be used for medical or legal purposes ## Citation If you use this model in your research, please cite: ```bibtex @misc{koushnir2025vanpyvoiceanalysisframework, title={VANPY: Voice Analysis Framework}, author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan}, year={2025}, eprint={2502.17579}, archivePrefix={arXiv}, primaryClass={cs.SD}, url={https://arxiv.org/abs/2502.17579}, } ```