griko commited on
Commit
1252249
·
verified ·
1 Parent(s): 25c1e4e

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +17 -22
  2. config.json +1 -1
  3. scaler.joblib +2 -2
  4. svr_model.joblib +3 -0
README.md CHANGED
@@ -6,33 +6,28 @@ datasets:
6
  libraries:
7
  - speechbrain
8
  tags:
9
- - gender-classification
10
  - speaker-characteristics
11
  - speaker-recognition
12
  - audio-classification
13
  - voice-analysis
14
  ---
15
 
16
- # Gender Classification Model
17
 
18
- This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVM classifier to predict speaker gender from audio input. The model was trained and evaluated on the VoxCeleb2, Mozilla Common Voice v10.0, and TIMIT datasets
19
 
20
  ## Model Details
21
  - Input: Audio file (will be converted to 16kHz, mono, single channel)
22
- - Output: Gender prediction ("male" or "female")
23
  - Speaker embedding: 192-dimensional ECAPA-TDNN embedding from SpeechBrain
24
- - Classifier: Support Vector Machine optimized through Optuna (200 trials)
25
  - Performance:
26
- - VoxCeleb2 test set: 98.9% accuracy, 0.9885 F1-score
27
- - Mozilla Common Voice v10.0 English validated test set: 92.3% accuracy
28
- - TIMIT test set: 99.6% accuracy
29
 
30
  ## Training Data
31
  The model was trained on VoxCeleb2 dataset:
32
- - Training set: 1,691 speakers (845 females, 846 males)
33
- - Validation set: 785 speakers (396 females, 389 males)
34
- - Test set: 1,647 speakers (828 females, 819 males)
35
- - No speaker overlap between sets
36
  - Audio preprocessing:
37
  - Converted to WAV format, single channel, 16kHz sampling rate, 256 kp/s bitrate
38
  - Applied SileroVAD for voice activity detection, taking the first voiced segment
@@ -42,35 +37,35 @@ The model was trained on VoxCeleb2 dataset:
42
  You can install the package directly from GitHub:
43
 
44
  ```bash
45
- pip install git+https://github.com/griko/voice-gender-classification.git
46
  ```
47
 
48
  ## Usage
49
 
50
  ```python
51
- from voice_gender_classification import GenderClassificationPipeline
52
 
53
  # Load the pipeline
54
- classifier = GenderClassificationPipeline.from_pretrained(
55
- "griko/gender_cls_svm_ecapa_voxceleb"
56
  )
57
 
58
  # Single file prediction
59
- result = classifier("path/to/audio.wav")
60
- print(result) # ["female"] or ["male"]
61
 
62
  # Batch prediction
63
- results = classifier(["audio1.wav", "audio2.wav"])
64
- print(results) # ["female", "male", "female"]
65
  ```
66
 
67
  ## Limitations
68
  - Model was trained on celebrity voices from YouTube interviews
69
  - Performance may vary on different audio qualities or recording conditions
70
- - Designed for binary gender classification only
71
 
72
  ## Citation
73
  If you use this model in your research, please cite:
74
  ```bibtex
75
  TBD
76
- ```
 
6
  libraries:
7
  - speechbrain
8
  tags:
9
+ - height-estimation
10
  - speaker-characteristics
11
  - speaker-recognition
12
  - audio-classification
13
  - voice-analysis
14
  ---
15
 
16
+ # Height Estimation Model
17
 
18
+ This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVR regressor to predict speaker height from audio input. The model was trained on the VoxCeleb2 and evaluated on the VoxCeleb2 and TIMIT datasets.
19
 
20
  ## Model Details
21
  - Input: Audio file (will be converted to 16kHz, mono, single channel)
22
+ - Output: Predicted height in centimeters (continuous value)
23
  - Speaker embedding: 192-dimensional ECAPA-TDNN embedding from SpeechBrain
24
+ - Regressor: Support Vector Regression optimized through Optuna
25
  - Performance:
26
+ - VoxCeleb2 test set: 6.01 cm Mean Absolute Error (MAE)
27
+ - TIMIT test set: 6.02 cm Mean Absolute Error (MAE)
 
28
 
29
  ## Training Data
30
  The model was trained on VoxCeleb2 dataset:
 
 
 
 
31
  - Audio preprocessing:
32
  - Converted to WAV format, single channel, 16kHz sampling rate, 256 kp/s bitrate
33
  - Applied SileroVAD for voice activity detection, taking the first voiced segment
 
37
  You can install the package directly from GitHub:
38
 
39
  ```bash
40
+ pip install git+https://github.com/griko/voice-height-regression.git
41
  ```
42
 
43
  ## Usage
44
 
45
  ```python
46
+ from height_regressor import HeightRegressionPipeline
47
 
48
  # Load the pipeline
49
+ regressor = HeightRegressionPipeline.from_pretrained(
50
+ "griko/height_reg_svr_ecapa_voxceleb"
51
  )
52
 
53
  # Single file prediction
54
+ result = regressor("path/to/audio.wav")
55
+ print(f"Predicted height: {result[0]:.1f} cm")
56
 
57
  # Batch prediction
58
+ results = regressor(["audio1.wav", "audio2.wav"])
59
+ print(f"Predicted heights: {[f'{h:.1f}' for h in results]} cm")
60
  ```
61
 
62
  ## Limitations
63
  - Model was trained on celebrity voices from YouTube interviews
64
  - Performance may vary on different audio qualities or recording conditions
65
+ - Height predictions are estimates and should not be used for medical or legal purposes
66
 
67
  ## Citation
68
  If you use this model in your research, please cite:
69
  ```bibtex
70
  TBD
71
+ ```
config.json CHANGED
@@ -1 +1 @@
1
- {"labels": ["female", "male"], "feature_names": ["0_speechbrain_embedding", "1_speechbrain_embedding", "2_speechbrain_embedding", "3_speechbrain_embedding", "4_speechbrain_embedding", "5_speechbrain_embedding", "6_speechbrain_embedding", "7_speechbrain_embedding", "8_speechbrain_embedding", "9_speechbrain_embedding", "10_speechbrain_embedding", "11_speechbrain_embedding", "12_speechbrain_embedding", "13_speechbrain_embedding", "14_speechbrain_embedding", "15_speechbrain_embedding", "16_speechbrain_embedding", "17_speechbrain_embedding", "18_speechbrain_embedding", "19_speechbrain_embedding", "20_speechbrain_embedding", "21_speechbrain_embedding", "22_speechbrain_embedding", "23_speechbrain_embedding", "24_speechbrain_embedding", "25_speechbrain_embedding", "26_speechbrain_embedding", "27_speechbrain_embedding", "28_speechbrain_embedding", "29_speechbrain_embedding", "30_speechbrain_embedding", "31_speechbrain_embedding", "32_speechbrain_embedding", "33_speechbrain_embedding", "34_speechbrain_embedding", "35_speechbrain_embedding", "36_speechbrain_embedding", "37_speechbrain_embedding", "38_speechbrain_embedding", "39_speechbrain_embedding", "40_speechbrain_embedding", "41_speechbrain_embedding", "42_speechbrain_embedding", "43_speechbrain_embedding", "44_speechbrain_embedding", "45_speechbrain_embedding", "46_speechbrain_embedding", "47_speechbrain_embedding", "48_speechbrain_embedding", "49_speechbrain_embedding", "50_speechbrain_embedding", "51_speechbrain_embedding", "52_speechbrain_embedding", "53_speechbrain_embedding", "54_speechbrain_embedding", "55_speechbrain_embedding", "56_speechbrain_embedding", "57_speechbrain_embedding", "58_speechbrain_embedding", "59_speechbrain_embedding", "60_speechbrain_embedding", "61_speechbrain_embedding", "62_speechbrain_embedding", "63_speechbrain_embedding", "64_speechbrain_embedding", "65_speechbrain_embedding", "66_speechbrain_embedding", "67_speechbrain_embedding", "68_speechbrain_embedding", "69_speechbrain_embedding", "70_speechbrain_embedding", "71_speechbrain_embedding", "72_speechbrain_embedding", "73_speechbrain_embedding", "74_speechbrain_embedding", "75_speechbrain_embedding", "76_speechbrain_embedding", "77_speechbrain_embedding", "78_speechbrain_embedding", "79_speechbrain_embedding", "80_speechbrain_embedding", "81_speechbrain_embedding", "82_speechbrain_embedding", "83_speechbrain_embedding", "84_speechbrain_embedding", "85_speechbrain_embedding", "86_speechbrain_embedding", "87_speechbrain_embedding", "88_speechbrain_embedding", "89_speechbrain_embedding", "90_speechbrain_embedding", "91_speechbrain_embedding", "92_speechbrain_embedding", "93_speechbrain_embedding", "94_speechbrain_embedding", "95_speechbrain_embedding", "96_speechbrain_embedding", "97_speechbrain_embedding", "98_speechbrain_embedding", "99_speechbrain_embedding", "100_speechbrain_embedding", "101_speechbrain_embedding", "102_speechbrain_embedding", "103_speechbrain_embedding", "104_speechbrain_embedding", "105_speechbrain_embedding", "106_speechbrain_embedding", "107_speechbrain_embedding", "108_speechbrain_embedding", "109_speechbrain_embedding", "110_speechbrain_embedding", "111_speechbrain_embedding", "112_speechbrain_embedding", "113_speechbrain_embedding", "114_speechbrain_embedding", "115_speechbrain_embedding", "116_speechbrain_embedding", "117_speechbrain_embedding", "118_speechbrain_embedding", "119_speechbrain_embedding", "120_speechbrain_embedding", "121_speechbrain_embedding", "122_speechbrain_embedding", "123_speechbrain_embedding", "124_speechbrain_embedding", "125_speechbrain_embedding", "126_speechbrain_embedding", "127_speechbrain_embedding", "128_speechbrain_embedding", "129_speechbrain_embedding", "130_speechbrain_embedding", "131_speechbrain_embedding", "132_speechbrain_embedding", "133_speechbrain_embedding", "134_speechbrain_embedding", "135_speechbrain_embedding", "136_speechbrain_embedding", "137_speechbrain_embedding", "138_speechbrain_embedding", "139_speechbrain_embedding", "140_speechbrain_embedding", "141_speechbrain_embedding", "142_speechbrain_embedding", "143_speechbrain_embedding", "144_speechbrain_embedding", "145_speechbrain_embedding", "146_speechbrain_embedding", "147_speechbrain_embedding", "148_speechbrain_embedding", "149_speechbrain_embedding", "150_speechbrain_embedding", "151_speechbrain_embedding", "152_speechbrain_embedding", "153_speechbrain_embedding", "154_speechbrain_embedding", "155_speechbrain_embedding", "156_speechbrain_embedding", "157_speechbrain_embedding", "158_speechbrain_embedding", "159_speechbrain_embedding", "160_speechbrain_embedding", "161_speechbrain_embedding", "162_speechbrain_embedding", "163_speechbrain_embedding", "164_speechbrain_embedding", "165_speechbrain_embedding", "166_speechbrain_embedding", "167_speechbrain_embedding", "168_speechbrain_embedding", "169_speechbrain_embedding", "170_speechbrain_embedding", "171_speechbrain_embedding", "172_speechbrain_embedding", "173_speechbrain_embedding", "174_speechbrain_embedding", "175_speechbrain_embedding", "176_speechbrain_embedding", "177_speechbrain_embedding", "178_speechbrain_embedding", "179_speechbrain_embedding", "180_speechbrain_embedding", "181_speechbrain_embedding", "182_speechbrain_embedding", "183_speechbrain_embedding", "184_speechbrain_embedding", "185_speechbrain_embedding", "186_speechbrain_embedding", "187_speechbrain_embedding", "188_speechbrain_embedding", "189_speechbrain_embedding", "190_speechbrain_embedding", "191_speechbrain_embedding"]}
 
1
+ {"feature_names": ["0_speechbrain_embedding", "1_speechbrain_embedding", "2_speechbrain_embedding", "3_speechbrain_embedding", "4_speechbrain_embedding", "5_speechbrain_embedding", "6_speechbrain_embedding", "7_speechbrain_embedding", "8_speechbrain_embedding", "9_speechbrain_embedding", "10_speechbrain_embedding", "11_speechbrain_embedding", "12_speechbrain_embedding", "13_speechbrain_embedding", "14_speechbrain_embedding", "15_speechbrain_embedding", "16_speechbrain_embedding", "17_speechbrain_embedding", "18_speechbrain_embedding", "19_speechbrain_embedding", "20_speechbrain_embedding", "21_speechbrain_embedding", "22_speechbrain_embedding", "23_speechbrain_embedding", "24_speechbrain_embedding", "25_speechbrain_embedding", "26_speechbrain_embedding", "27_speechbrain_embedding", "28_speechbrain_embedding", "29_speechbrain_embedding", "30_speechbrain_embedding", "31_speechbrain_embedding", "32_speechbrain_embedding", "33_speechbrain_embedding", "34_speechbrain_embedding", "35_speechbrain_embedding", "36_speechbrain_embedding", "37_speechbrain_embedding", "38_speechbrain_embedding", "39_speechbrain_embedding", "40_speechbrain_embedding", "41_speechbrain_embedding", "42_speechbrain_embedding", "43_speechbrain_embedding", "44_speechbrain_embedding", "45_speechbrain_embedding", "46_speechbrain_embedding", "47_speechbrain_embedding", "48_speechbrain_embedding", "49_speechbrain_embedding", "50_speechbrain_embedding", "51_speechbrain_embedding", "52_speechbrain_embedding", "53_speechbrain_embedding", "54_speechbrain_embedding", "55_speechbrain_embedding", "56_speechbrain_embedding", "57_speechbrain_embedding", "58_speechbrain_embedding", "59_speechbrain_embedding", "60_speechbrain_embedding", "61_speechbrain_embedding", "62_speechbrain_embedding", "63_speechbrain_embedding", "64_speechbrain_embedding", "65_speechbrain_embedding", "66_speechbrain_embedding", "67_speechbrain_embedding", "68_speechbrain_embedding", "69_speechbrain_embedding", "70_speechbrain_embedding", "71_speechbrain_embedding", "72_speechbrain_embedding", "73_speechbrain_embedding", "74_speechbrain_embedding", "75_speechbrain_embedding", "76_speechbrain_embedding", "77_speechbrain_embedding", "78_speechbrain_embedding", "79_speechbrain_embedding", "80_speechbrain_embedding", "81_speechbrain_embedding", "82_speechbrain_embedding", "83_speechbrain_embedding", "84_speechbrain_embedding", "85_speechbrain_embedding", "86_speechbrain_embedding", "87_speechbrain_embedding", "88_speechbrain_embedding", "89_speechbrain_embedding", "90_speechbrain_embedding", "91_speechbrain_embedding", "92_speechbrain_embedding", "93_speechbrain_embedding", "94_speechbrain_embedding", "95_speechbrain_embedding", "96_speechbrain_embedding", "97_speechbrain_embedding", "98_speechbrain_embedding", "99_speechbrain_embedding", "100_speechbrain_embedding", "101_speechbrain_embedding", "102_speechbrain_embedding", "103_speechbrain_embedding", "104_speechbrain_embedding", "105_speechbrain_embedding", "106_speechbrain_embedding", "107_speechbrain_embedding", "108_speechbrain_embedding", "109_speechbrain_embedding", "110_speechbrain_embedding", "111_speechbrain_embedding", "112_speechbrain_embedding", "113_speechbrain_embedding", "114_speechbrain_embedding", "115_speechbrain_embedding", "116_speechbrain_embedding", "117_speechbrain_embedding", "118_speechbrain_embedding", "119_speechbrain_embedding", "120_speechbrain_embedding", "121_speechbrain_embedding", "122_speechbrain_embedding", "123_speechbrain_embedding", "124_speechbrain_embedding", "125_speechbrain_embedding", "126_speechbrain_embedding", "127_speechbrain_embedding", "128_speechbrain_embedding", "129_speechbrain_embedding", "130_speechbrain_embedding", "131_speechbrain_embedding", "132_speechbrain_embedding", "133_speechbrain_embedding", "134_speechbrain_embedding", "135_speechbrain_embedding", "136_speechbrain_embedding", "137_speechbrain_embedding", "138_speechbrain_embedding", "139_speechbrain_embedding", "140_speechbrain_embedding", "141_speechbrain_embedding", "142_speechbrain_embedding", "143_speechbrain_embedding", "144_speechbrain_embedding", "145_speechbrain_embedding", "146_speechbrain_embedding", "147_speechbrain_embedding", "148_speechbrain_embedding", "149_speechbrain_embedding", "150_speechbrain_embedding", "151_speechbrain_embedding", "152_speechbrain_embedding", "153_speechbrain_embedding", "154_speechbrain_embedding", "155_speechbrain_embedding", "156_speechbrain_embedding", "157_speechbrain_embedding", "158_speechbrain_embedding", "159_speechbrain_embedding", "160_speechbrain_embedding", "161_speechbrain_embedding", "162_speechbrain_embedding", "163_speechbrain_embedding", "164_speechbrain_embedding", "165_speechbrain_embedding", "166_speechbrain_embedding", "167_speechbrain_embedding", "168_speechbrain_embedding", "169_speechbrain_embedding", "170_speechbrain_embedding", "171_speechbrain_embedding", "172_speechbrain_embedding", "173_speechbrain_embedding", "174_speechbrain_embedding", "175_speechbrain_embedding", "176_speechbrain_embedding", "177_speechbrain_embedding", "178_speechbrain_embedding", "179_speechbrain_embedding", "180_speechbrain_embedding", "181_speechbrain_embedding", "182_speechbrain_embedding", "183_speechbrain_embedding", "184_speechbrain_embedding", "185_speechbrain_embedding", "186_speechbrain_embedding", "187_speechbrain_embedding", "188_speechbrain_embedding", "189_speechbrain_embedding", "190_speechbrain_embedding", "191_speechbrain_embedding"]}
scaler.joblib CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4e44e58d1e6602f61913b53f65bcc328e400a4c33a97410c543a5f1e2f357651
3
- size 25165
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2d7b9497213c91acc99e733a887de9a15959b9b7af412638f85ac9e57c11337
3
+ size 11559
svr_model.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8dc5204d15b3682d705ffcd25a81db4e185e161c55e10f63ee07db59636443a6
3
+ size 38068815