griko commited on
Commit
350d4c0
·
verified ·
1 Parent(s): 1ec9b6f

Upload folder using huggingface_hub

Browse files
Files changed (5) hide show
  1. README.md +71 -0
  2. config.json +1 -0
  3. requirements.txt +6 -0
  4. scaler.joblib +3 -0
  5. svr_model.joblib +3 -0
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: multilingual
3
+ license: apache-2.0
4
+ datasets:
5
+ - voxceleb2
6
+ libraries:
7
+ - speechbrain
8
+ tags:
9
+ - height-estimation
10
+ - speaker-characteristics
11
+ - speaker-recognition
12
+ - audio-classification
13
+ - voice-analysis
14
+ ---
15
+
16
+ # Height Estimation Model
17
+
18
+ This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVR regressor to predict speaker height from audio input. The model was trained on the VoxCeleb2 and evaluated on the VoxCeleb2 and TIMIT datasets.
19
+
20
+ ## Model Details
21
+ - Input: Audio file (will be converted to 16kHz, mono, single channel)
22
+ - Output: Predicted height in centimeters (continuous value)
23
+ - Speaker embedding: 192-dimensional ECAPA-TDNN embedding from SpeechBrain
24
+ - Regressor: Support Vector Regression optimized through Optuna
25
+ - Performance:
26
+ - VoxCeleb2 test set: 6.01 cm Mean Absolute Error (MAE)
27
+ - TIMIT test set: 6.02 cm Mean Absolute Error (MAE)
28
+
29
+ ## Training Data
30
+ The model was trained on VoxCeleb2 dataset:
31
+ - Audio preprocessing:
32
+ - Converted to WAV format, single channel, 16kHz sampling rate, 256 kp/s bitrate
33
+ - Applied SileroVAD for voice activity detection, taking the first voiced segment
34
+
35
+ ## Installation
36
+
37
+ You can install the package directly from GitHub:
38
+
39
+ ```bash
40
+ pip install git+https://github.com/griko/voice-height-regression.git
41
+ ```
42
+
43
+ ## Usage
44
+
45
+ ```python
46
+ from height_regressor import HeightRegressionPipeline
47
+
48
+ # Load the pipeline
49
+ regressor = HeightRegressionPipeline.from_pretrained(
50
+ "griko/height_reg_svr_ecapa_voxceleb"
51
+ )
52
+
53
+ # Single file prediction
54
+ result = regressor("path/to/audio.wav")
55
+ print(f"Predicted height: {result[0]:.1f} cm")
56
+
57
+ # Batch prediction
58
+ results = regressor(["audio1.wav", "audio2.wav"])
59
+ print(f"Predicted heights: {[f'{h:.1f}' for h in results]} cm")
60
+ ```
61
+
62
+ ## Limitations
63
+ - Model was trained on celebrity voices from YouTube interviews
64
+ - Performance may vary on different audio qualities or recording conditions
65
+ - Height predictions are estimates and should not be used for medical or legal purposes
66
+
67
+ ## Citation
68
+ If you use this model in your research, please cite:
69
+ ```bibtex
70
+ TBD
71
+ ```
config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"feature_names": ["0_speechbrain_embedding", "1_speechbrain_embedding", "2_speechbrain_embedding", "3_speechbrain_embedding", "4_speechbrain_embedding", "5_speechbrain_embedding", "6_speechbrain_embedding", "7_speechbrain_embedding", "8_speechbrain_embedding", "9_speechbrain_embedding", "10_speechbrain_embedding", "11_speechbrain_embedding", "12_speechbrain_embedding", "13_speechbrain_embedding", "14_speechbrain_embedding", "15_speechbrain_embedding", "16_speechbrain_embedding", "17_speechbrain_embedding", "18_speechbrain_embedding", "19_speechbrain_embedding", "20_speechbrain_embedding", "21_speechbrain_embedding", "22_speechbrain_embedding", "23_speechbrain_embedding", "24_speechbrain_embedding", "25_speechbrain_embedding", "26_speechbrain_embedding", "27_speechbrain_embedding", "28_speechbrain_embedding", "29_speechbrain_embedding", "30_speechbrain_embedding", "31_speechbrain_embedding", "32_speechbrain_embedding", "33_speechbrain_embedding", "34_speechbrain_embedding", "35_speechbrain_embedding", "36_speechbrain_embedding", "37_speechbrain_embedding", "38_speechbrain_embedding", "39_speechbrain_embedding", "40_speechbrain_embedding", "41_speechbrain_embedding", "42_speechbrain_embedding", "43_speechbrain_embedding", "44_speechbrain_embedding", "45_speechbrain_embedding", "46_speechbrain_embedding", "47_speechbrain_embedding", "48_speechbrain_embedding", "49_speechbrain_embedding", "50_speechbrain_embedding", "51_speechbrain_embedding", "52_speechbrain_embedding", "53_speechbrain_embedding", "54_speechbrain_embedding", "55_speechbrain_embedding", "56_speechbrain_embedding", "57_speechbrain_embedding", "58_speechbrain_embedding", "59_speechbrain_embedding", "60_speechbrain_embedding", "61_speechbrain_embedding", "62_speechbrain_embedding", "63_speechbrain_embedding", "64_speechbrain_embedding", "65_speechbrain_embedding", "66_speechbrain_embedding", "67_speechbrain_embedding", "68_speechbrain_embedding", "69_speechbrain_embedding", "70_speechbrain_embedding", "71_speechbrain_embedding", "72_speechbrain_embedding", "73_speechbrain_embedding", "74_speechbrain_embedding", "75_speechbrain_embedding", "76_speechbrain_embedding", "77_speechbrain_embedding", "78_speechbrain_embedding", "79_speechbrain_embedding", "80_speechbrain_embedding", "81_speechbrain_embedding", "82_speechbrain_embedding", "83_speechbrain_embedding", "84_speechbrain_embedding", "85_speechbrain_embedding", "86_speechbrain_embedding", "87_speechbrain_embedding", "88_speechbrain_embedding", "89_speechbrain_embedding", "90_speechbrain_embedding", "91_speechbrain_embedding", "92_speechbrain_embedding", "93_speechbrain_embedding", "94_speechbrain_embedding", "95_speechbrain_embedding", "96_speechbrain_embedding", "97_speechbrain_embedding", "98_speechbrain_embedding", "99_speechbrain_embedding", "100_speechbrain_embedding", "101_speechbrain_embedding", "102_speechbrain_embedding", "103_speechbrain_embedding", "104_speechbrain_embedding", "105_speechbrain_embedding", "106_speechbrain_embedding", "107_speechbrain_embedding", "108_speechbrain_embedding", "109_speechbrain_embedding", "110_speechbrain_embedding", "111_speechbrain_embedding", "112_speechbrain_embedding", "113_speechbrain_embedding", "114_speechbrain_embedding", "115_speechbrain_embedding", "116_speechbrain_embedding", "117_speechbrain_embedding", "118_speechbrain_embedding", "119_speechbrain_embedding", "120_speechbrain_embedding", "121_speechbrain_embedding", "122_speechbrain_embedding", "123_speechbrain_embedding", "124_speechbrain_embedding", "125_speechbrain_embedding", "126_speechbrain_embedding", "127_speechbrain_embedding", "128_speechbrain_embedding", "129_speechbrain_embedding", "130_speechbrain_embedding", "131_speechbrain_embedding", "132_speechbrain_embedding", "133_speechbrain_embedding", "134_speechbrain_embedding", "135_speechbrain_embedding", "136_speechbrain_embedding", "137_speechbrain_embedding", "138_speechbrain_embedding", "139_speechbrain_embedding", "140_speechbrain_embedding", "141_speechbrain_embedding", "142_speechbrain_embedding", "143_speechbrain_embedding", "144_speechbrain_embedding", "145_speechbrain_embedding", "146_speechbrain_embedding", "147_speechbrain_embedding", "148_speechbrain_embedding", "149_speechbrain_embedding", "150_speechbrain_embedding", "151_speechbrain_embedding", "152_speechbrain_embedding", "153_speechbrain_embedding", "154_speechbrain_embedding", "155_speechbrain_embedding", "156_speechbrain_embedding", "157_speechbrain_embedding", "158_speechbrain_embedding", "159_speechbrain_embedding", "160_speechbrain_embedding", "161_speechbrain_embedding", "162_speechbrain_embedding", "163_speechbrain_embedding", "164_speechbrain_embedding", "165_speechbrain_embedding", "166_speechbrain_embedding", "167_speechbrain_embedding", "168_speechbrain_embedding", "169_speechbrain_embedding", "170_speechbrain_embedding", "171_speechbrain_embedding", "172_speechbrain_embedding", "173_speechbrain_embedding", "174_speechbrain_embedding", "175_speechbrain_embedding", "176_speechbrain_embedding", "177_speechbrain_embedding", "178_speechbrain_embedding", "179_speechbrain_embedding", "180_speechbrain_embedding", "181_speechbrain_embedding", "182_speechbrain_embedding", "183_speechbrain_embedding", "184_speechbrain_embedding", "185_speechbrain_embedding", "186_speechbrain_embedding", "187_speechbrain_embedding", "188_speechbrain_embedding", "189_speechbrain_embedding", "190_speechbrain_embedding", "191_speechbrain_embedding"]}
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ scikit-learn
2
+ pandas
3
+ soundfile
4
+ speechbrain
5
+ torch
6
+ torchaudio
scaler.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2d7b9497213c91acc99e733a887de9a15959b9b7af412638f85ac9e57c11337
3
+ size 11559
svr_model.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8dc5204d15b3682d705ffcd25a81db4e185e161c55e10f63ee07db59636443a6
3
+ size 38068815