File size: 752 Bytes
16a9f96 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
### Dataset is about ~2000 hours of speech and vocals
### Supported languages (english or spanish?) who ever moves first is:
~800 hrs of English (with vast verity of speakers and every emotion)
~200 Spanish
~42 French
~188 Russian
~70 Arabic
~140 Japanese
~70 Chinese (Mandarin)
~80 Korean
~30 Hindi
~53 Indonesian
~30 Tagalog
~40 Portuguese
~35 German
~190 singing (all languages)
common language (I don't remember how much data was there)
## Type: big-base for finetuning
Batch: 2-40-80
# Sampling frequency: 32k 40k
Total steps count: 371406
# Hardware used:
1 - h100, 4 - L40s
Expected release date - 22 july

() |