griko
/

age_reg_ann_ecapa_timit

speaker-characteristics

speaker-recognition

audio-regression

Model card Files Files and versions Community

age_reg_ann_ecapa_timit / README.md

griko's picture

Upload folder using huggingface_hub

81d5a6c verified 4 months ago

|

2.9 kB

	---
	language: multilingual
	license: apache-2.0
	datasets:
	- timit
	libraries:
	- speechbrain
	tags:
	- age-estimation
	- speaker-characteristics
	- speaker-recognition
	- audio-regression
	- voice-analysis
	---

	# Age Estimation Model

	This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an ANN regressor to predict speaker age from audio input. The model was trained on the TIMIT dataset.

	## Model Performance Comparison

	We provide multiple pre-trained models with different architectures and feature sets. Here's a comprehensive comparison of their performance:

	\| Model \| Architecture \| Features \| Training Data \| Test MAE \| Best For \|
	\|-------\|-------------\|----------\|---------------\|-----------\|----------\|
	\| VoxCeleb2 SVR (223) \| SVR \| ECAPA + Librosa (223-dim) \| VoxCeleb2 \| 7.88 years \| Best performance on VoxCeleb2 \|
	\| VoxCeleb2 SVR (192) \| SVR \| ECAPA only (192-dim) \| VoxCeleb2 \| 7.89 years \| Lightweight deployment \|
	\| TIMIT ANN (192) \| ANN \| ECAPA only (192-dim) \| TIMIT \| 4.95 years \| Clean studio recordings \|
	\| Combined ANN (223) \| ANN \| ECAPA + Librosa (223-dim) \| VoxCeleb2 + TIMIT \| 6.93 years \| Best general performance \|

	You may find other models [here](https://huggingface.co/griko).

	## Model Details
	- Input: Audio file (will be converted to 16kHz, mono, single channel)
	- Output: Predicted age in years (continuous value)
	- Features: SpeechBrain ECAPA-TDNN embedding [192 features]
	- Regressor: Artificial Neural Network optimized through Optuna
	- Performance:
	- TIMIT test set: 4.95 years Mean Absolute Error (MAE)

	## Features
	1. SpeechBrain ECAPA-TDNN embeddings (192 dimensions)

	## Training Data
	The model was trained on the TIMIT dataset:
	- High-quality studio recordings
	- Single channel, 16kHz sampling rate
	- Carefully controlled recording conditions
	- Age annotations provided in the original dataset
	## Installation

	```bash
	pip install git+https://github.com/griko/voice-age-regression.git[ann-ecapa-timit]
	```

	## Usage

	```python
	from age_regressor import AgeRegressionPipeline

	# Load the pipeline
	regressor = AgeRegressionPipeline.from_pretrained(
	"griko/age_reg_ann_ecapa_timit"
	)

	# Single file prediction
	result = regressor("path/to/audio.wav")
	print(f"Predicted age: {result[0]:.1f} years")

	# Batch prediction
	results = regressor(["audio1.wav", "audio2.wav"])
	print(f"Predicted ages: {[f'{age:.1f}' for age in results]} years")
	```

	## Limitations
	- Model was trained on carefully controlled studio recordings recordings
	- Performance may vary on different audio qualities or recording conditions
	- Age predictions are estimates and should not be used for medical or legal purposes
	- Age estimations should be treated as approximate values, not exact measurements

	## Citation
	If you use this model in your research, please cite:
	```bibtex
	TBD
	```