|
---
|
|
language: multilingual
|
|
license: apache-2.0
|
|
datasets:
|
|
- timit
|
|
libraries:
|
|
- speechbrain
|
|
tags:
|
|
- age-estimation
|
|
- speaker-characteristics
|
|
- speaker-recognition
|
|
- audio-regression
|
|
- voice-analysis
|
|
---
|
|
|
|
# Age Estimation Model
|
|
|
|
This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an ANN regressor to predict speaker age from audio input. The model was trained on the TIMIT dataset.
|
|
|
|
## Model Performance Comparison
|
|
|
|
We provide multiple pre-trained models with different architectures and feature sets. Here's a comprehensive comparison of their performance:
|
|
|
|
| Model | Architecture | Features | Training Data | Test MAE | Best For |
|
|
|-------|-------------|----------|---------------|-----------|----------|
|
|
| VoxCeleb2 SVR (223) | SVR | ECAPA + Librosa (223-dim) | VoxCeleb2 | 7.88 years | Best performance on VoxCeleb2 |
|
|
| VoxCeleb2 SVR (192) | SVR | ECAPA only (192-dim) | VoxCeleb2 | 7.89 years | Lightweight deployment |
|
|
| TIMIT ANN (192) | ANN | ECAPA only (192-dim) | TIMIT | 4.95 years | Clean studio recordings |
|
|
| Combined ANN (223) | ANN | ECAPA + Librosa (223-dim) | VoxCeleb2 + TIMIT | 6.93 years | Best general performance |
|
|
|
|
You may find other models [here](https://huggingface.co/griko).
|
|
|
|
## Model Details
|
|
- Input: Audio file (will be converted to 16kHz, mono, single channel)
|
|
- Output: Predicted age in years (continuous value)
|
|
- Features: SpeechBrain ECAPA-TDNN embedding [192 features]
|
|
- Regressor: Artificial Neural Network optimized through Optuna
|
|
- Performance:
|
|
- TIMIT test set: 4.95 years Mean Absolute Error (MAE)
|
|
|
|
## Features
|
|
1. SpeechBrain ECAPA-TDNN embeddings (192 dimensions)
|
|
|
|
## Training Data
|
|
The model was trained on the TIMIT dataset:
|
|
- High-quality studio recordings
|
|
- Single channel, 16kHz sampling rate
|
|
- Carefully controlled recording conditions
|
|
- Age annotations provided in the original dataset
|
|
## Installation
|
|
|
|
```bash
|
|
pip install git+https://github.com/griko/voice-age-regression.git[ann-ecapa-timit]
|
|
```
|
|
|
|
## Usage
|
|
|
|
```python
|
|
from age_regressor import AgeRegressionPipeline
|
|
|
|
# Load the pipeline
|
|
regressor = AgeRegressionPipeline.from_pretrained(
|
|
"griko/age_reg_ann_ecapa_timit"
|
|
)
|
|
|
|
# Single file prediction
|
|
result = regressor("path/to/audio.wav")
|
|
print(f"Predicted age: {result[0]:.1f} years")
|
|
|
|
# Batch prediction
|
|
results = regressor(["audio1.wav", "audio2.wav"])
|
|
print(f"Predicted ages: {[f'{age:.1f}' for age in results]} years")
|
|
```
|
|
|
|
## Limitations
|
|
- Model was trained on carefully controlled studio recordings recordings
|
|
- Performance may vary on different audio qualities or recording conditions
|
|
- Age predictions are estimates and should not be used for medical or legal purposes
|
|
- Age estimations should be treated as approximate values, not exact measurements
|
|
|
|
## Citation
|
|
If you use this model in your research, please cite:
|
|
```bibtex
|
|
TBD
|
|
```
|
|
|