griko
/

height_reg_svr_ecapa_voxceleb

Audio Classification

height-estimation

speaker-characteristics

speaker-recognition

Model card Files Files and versions Community

height_reg_svr_ecapa_voxceleb / README.md

griko's picture

Upload folder using huggingface_hub

277e14c verified 14 days ago

|

history blame contribute delete

2.52 kB

	---
	language: multilingual
	license: apache-2.0
	datasets:
	- voxceleb2
	libraries:
	- speechbrain
	tags:
	- height-estimation
	- speaker-characteristics
	- speaker-recognition
	- audio-classification
	- voice-analysis
	---

	# Height Estimation Model

	This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVR regressor to predict speaker height from audio input. The model was trained on the VoxCeleb2 and evaluated on the VoxCeleb2 and TIMIT datasets.

	## Model Details
	- Input: Audio file (will be converted to 16kHz, mono, single channel)
	- Output: Predicted height in centimeters (continuous value)
	- Speaker embedding: 192-dimensional ECAPA-TDNN embedding from SpeechBrain
	- Regressor: Support Vector Regression optimized through Optuna
	- Performance:
	- VoxCeleb2 test set: 6.01 cm Mean Absolute Error (MAE)
	- TIMIT test set: 6.02 cm Mean Absolute Error (MAE)

	## Training Data
	The model was trained on height enriched VoxCeleb2 dataset (for details read the paper):
	- Audio preprocessing:
	- Converted to WAV format, single channel, 16kHz sampling rate, 256 kp/s bitrate
	- Applied SileroVAD for voice activity detection, taking the first voiced segment

	## Installation

	You can install the package directly from GitHub:

	```bash
	pip install git+https://github.com/griko/voice-height-regression.git
	```

	## Usage

	```python
	from voice_height_regressor import HeightRegressionPipeline

	# Load the pipeline
	regressor = HeightRegressionPipeline.from_pretrained(
	"griko/height_reg_svr_ecapa_voxceleb"
	)

	# Single file prediction
	result = regressor("path/to/audio.wav")
	print(f"Predicted height: {result[0]:.1f} cm")

	# Batch prediction
	results = regressor(["audio1.wav", "audio2.wav"])
	print(f"Predicted heights: {[f'{h:.1f}' for h in results]} cm")
	```

	## Limitations
	- Model was trained on celebrity voices from YouTube interviews
	- Performance may vary on different audio qualities or recording conditions
	- Height predictions are estimates and should not be used for medical or legal purposes

	## Citation
	If you use this model in your research, please cite:
	```bibtex
	@misc{koushnir2025vanpyvoiceanalysisframework,
	title={VANPY: Voice Analysis Framework},
	author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
	year={2025},
	eprint={2502.17579},
	archivePrefix={arXiv},
	primaryClass={cs.SD},
	url={https://arxiv.org/abs/2502.17579},
	}
	```