Abror Shopulatov's picture

6 6 70

Abror Shopulatov PRO

murodbek

·

https://murodbek.substack.com/

AI & ML interests

Machine Learning, NLP, Grammatical Error Correction

Recent Activity

reacted to hexgrad's post with 🔥 4 days ago

Wanted: Peak Data. I'm collecting audio data to train another TTS model: + AVM data: ChatGPT Advanced Voice Mode audio & text from source + Professional audio: Permissive (CC0, Apache, MIT, CC-BY) This audio should *impress* most native speakers, not just barely pass their audio Turing tests. Professional-caliber means S or A-tier, not your average bloke off the street. Traditional TTS may not make the cut. Absolutely no low-fi microphone recordings like Common Voice. The bar is much higher than last time, so there are no timelines yet and I expect it may take longer to collect such mythical data. Raising the bar means evicting quite a bit of old data, and voice/language availability may decrease. The theme is *quality* over quantity. I would rather have 1 hour of A/S-tier than 100 hours of mid data. I have nothing to offer but the north star of a future Apache 2.0 TTS model, so prefer data that you *already have* and costs you *nothing extra* to send. Additionally, *all* the new data may be used to construct public, Apache 2.0 voicepacks, and if that arrangement doesn't work for you, no need to send any audio. Last time I asked for horses; now I'm asking for unicorns. As of writing this post, I've currently got a few English & Chinese unicorns, but there is plenty of room in the stable. Find me over on Discord at `rzvzn`: https://discord.gg/QuGxSWBfQy

liked a dataset 11 days ago

DavronSherbaev/uzbekvoice-filtered

liked a model 11 days ago

deepseek-ai/DeepSeek-R1

View all activity

Organizations

Papers 1

arxiv:2409.04269

spaces 1

Mmlu Lite Uz

Review of MMLU-Lite-uz

models 3

murodbek/uzroberta-panx-uz

Token Classification • Updated Aug 9, 2023 • 170

murodbek/xlm-roberta-panx-uz

Token Classification • Updated Apr 13, 2023 • 114

murodbek/uzroberta-sentiment-analysis

Text Classification • Updated Apr 11, 2023 • 29

datasets 5

murodbek/uzlib

Viewer • Updated 16 days ago • 1.86k • 94

murodbek/uzbek-speech-corpus

Viewer • Updated 19 days ago • 108k • 113 • 1

murodbek/Global-MMLU-uz

Viewer • Updated Dec 21, 2024 • 14.3k • 49

murodbek/Global-MMLU-Lite-uz

Viewer • Updated Dec 15, 2024 • 685 • 50

murodbek/uz-text-classification

Viewer • Updated Oct 31, 2023 • 513k • 106 • 2