--- language: multilingual license: apache-2.0 datasets: - voxceleb2 libraries: - speechbrain tags: - age-estimation - speaker-characteristics - speaker-recognition - audio-regression - voice-analysis --- # Age Estimation Model This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVR regressor to predict speaker age from audio input. The model was trained on the VoxCeleb2 dataset. ## Model Performance Comparison We provide multiple pre-trained models with different architectures and feature sets. Here's a comprehensive comparison of their performance: | Model | Architecture | Features | Training Data | Test MAE | Best For | |-------|-------------|----------|---------------|-----------|----------| | VoxCeleb2 SVR (223) | SVR | ECAPA + Librosa (223-dim) | VoxCeleb2 | 7.88 years | Best performance on VoxCeleb2 | | VoxCeleb2 SVR (192) | SVR | ECAPA only (192-dim) | VoxCeleb2 | 7.89 years | Lightweight deployment | | TIMIT ANN (192) | ANN | ECAPA only (192-dim) | TIMIT | 4.95 years | Clean studio recordings | | Combined ANN (223) | ANN | ECAPA + Librosa (223-dim) | VoxCeleb2 + TIMIT | 6.93 years | Best general performance | You may find other models [here](https://huggingface.co/griko). ## Model Details - Input: Audio file (will be converted to 16kHz, mono, single channel) - Output: Predicted age in years (continuous value) - Features: SpeechBrain ECAPA-TDNN embedding [192 features] - Regressor: Support Vector Regression optimized through Optuna - Performance: - VoxCeleb2 test set: 7.89 years Mean Absolute Error (MAE) ## Features 1. SpeechBrain ECAPA-TDNN embeddings (192 dimensions) ## Training Data The model was trained on the VoxCeleb2 dataset: - Audio preprocessing: - Converted to WAV format, single channel, 16kHz sampling rate - Applied SileroVAD for voice activity detection, taking the first voiced segment - Age data was collected from Wikidata and public sources ## Installation ```bash pip install git+https://github.com/griko/voice-age-regression.git#egg=voice-age-regressor[svr-ecapa-voxceleb2] ``` ## Usage ```python from age_regressor import AgeRegressionPipeline # Load the pipeline regressor = AgeRegressionPipeline.from_pretrained( "griko/age_reg_svr_ecapa_voxceleb2" ) # Single file prediction result = regressor("path/to/audio.wav") print(f"Predicted age: {result[0]:.1f} years") # Batch prediction results = regressor(["audio1.wav", "audio2.wav"]) print(f"Predicted ages: {[f'{age:.1f}' for age in results]} years") ``` ## Limitations - Model was trained on celebrity voices from YouTube interviews recordings - Performance may vary on different audio qualities or recording conditions - Age predictions are estimates and should not be used for medical or legal purposes - Age estimations should be treated as approximate values, not exact measurements ## Citation If you use this model in your research, please cite: ```bibtex @misc{koushnir2025vanpyvoiceanalysisframework, title={VANPY: Voice Analysis Framework}, author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan}, year={2025}, eprint={2502.17579}, archivePrefix={arXiv}, primaryClass={cs.SD}, url={https://arxiv.org/abs/2502.17579}, } ```