prince-canuma commited on 10 days ago

Commit

15cd2f0

verified ·

1 Parent(s): fd1831b

Initial upload

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +7 -0
DONATE.md +47 -0
README.md +143 -0
SAMPLES.md +48 -0
VOICES.md +161 -0
config.json +1 -0
kokoro-v1_0.pth +3 -0
kokoro-v1_0.safetensors +3 -0
samples/HEARME.wav +3 -0
samples/af_heart_0.wav +3 -0
samples/af_heart_1.wav +3 -0
samples/af_heart_2.wav +3 -0
samples/af_heart_3.wav +3 -0
samples/af_heart_4.wav +3 -0
samples/af_heart_5.wav +3 -0
voices/af_alloy.pt +3 -0
voices/af_aoede.pt +3 -0
voices/af_bella.pt +3 -0
voices/af_heart.pt +3 -0
voices/af_jessica.pt +3 -0
voices/af_kore.pt +3 -0
voices/af_nicole.pt +3 -0
voices/af_nova.pt +3 -0
voices/af_river.pt +3 -0
voices/af_sarah.pt +3 -0
voices/af_sky.pt +3 -0
voices/am_adam.pt +3 -0
voices/am_echo.pt +3 -0
voices/am_eric.pt +3 -0
voices/am_fenrir.pt +3 -0
voices/am_liam.pt +3 -0
voices/am_michael.pt +3 -0
voices/am_onyx.pt +3 -0
voices/am_puck.pt +3 -0
voices/am_santa.pt +3 -0
voices/bf_alice.pt +3 -0
voices/bf_emma.pt +3 -0
voices/bf_isabella.pt +3 -0
voices/bf_lily.pt +3 -0
voices/bm_daniel.pt +3 -0
voices/bm_fable.pt +3 -0
voices/bm_george.pt +3 -0
voices/bm_lewis.pt +3 -0
voices/ef_dora.pt +3 -0
voices/em_alex.pt +3 -0
voices/em_santa.pt +3 -0
voices/ff_siwis.pt +3 -0
voices/hf_alpha.pt +3 -0
voices/hf_beta.pt +3 -0
voices/hm_omega.pt +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,10 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+samples/af_heart_3.wav filter=lfs diff=lfs merge=lfs -text
+samples/af_heart_4.wav filter=lfs diff=lfs merge=lfs -text
+samples/af_heart_5.wav filter=lfs diff=lfs merge=lfs -text
+samples/HEARME.wav filter=lfs diff=lfs merge=lfs -text
+samples/af_heart_0.wav filter=lfs diff=lfs merge=lfs -text
+samples/af_heart_1.wav filter=lfs diff=lfs merge=lfs -text
+samples/af_heart_2.wav filter=lfs diff=lfs merge=lfs -text

DONATE.md ADDED Viewed

	@@ -0,0 +1,47 @@

+# Donate
+Apache software is free software.
+For those able & willing to support my work, I can be sponsored via [GitHub Sponsors](https://github.com/sponsors/hexgrad). I also accept *unconditional, no-strings-attached* donations in the form of **GPU cloud credit**. This helps me run experiments and train models.
+Please do not "attach strings" (e.g. model requests, consulting, brand sponsorships) to donations because some of these channels are anonymous: I may not be able to see who sent what.
+Also, along the lines of "Never buy a product based on the future promise of updates", I would discourage you from donating because you expect a specific model to come down the pipeline. Hopefully, donors broadly believe that good things happen when [someone gets this man (yours truly) a GPU](https://i.redd.it/r8dtt3n9rc431.jpg).
+### GitHub Sponsors
+https://github.com/sponsors/hexgrad
+### Vast.ai Referral Link
+Vast.ai is a vendor I use for cloud GPUs. I "earn 3% of all referred customer revenue as credits": [https://cloud.vast.ai/?ref_id=79907](https://cloud.vast.ai/?ref_id=79907)
+### Vast.ai Transfer Credit
+To **anonymously** transfer $5 of credit directly to my Vast.ai account `[email protected]`, you can use `transfer credit` in the Vast CLI like so:
+```sh
+vastai transfer credit [email protected] 5
+```
+The usage of the `transfer credit` command is documented here: https://docs.vast.ai/api/commands#voyTE
+```sh
+usage: vastai transfer credit RECIPIENT AMOUNT
+positional arguments:
+  recipient          email of recipient account
+  amount             $dollars of credit to transfer
+Transfer (amount) credits to account with email (recipient).
+```
+Note that `transfer credit` seems to be an anonymous command. If I don't say thank you, its because I can't see the sender!
+### RunPod Referral Link
+RunPod is another vendor I use for cloud GPUs. I earn "5% from serverless and 1% from templates": https://runpod.io?ref=pup8o2ly
+### RunPod Credit Codes
+After signing in to RunPod, under `Account > Billing > Credit Codes`, you can "generate a code that allows you to gift funds to another RunPod user":
+> Simply give them the code and they will be able to redeem it for credits on their billing page. Please safeguard your codes as they are worth money!
+> Credits will be debited from your account immediately. You can redeem the code yourself if you want to recover your credits. There is a 2% transaction fee for payment processing!
+If you wish to send codes, you can do so by emailing `[email protected]`, or DM me on Discord `@rzvzn`.

README.md ADDED Viewed

	@@ -0,0 +1,143 @@

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- yl4579/StyleTTS2-LJSpeech
+pipeline_tag: text-to-speech
+---
+**Kokoro** is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
+<audio controls><source src="https://huggingface.co/hexgrad/Kokoro-82M/resolve/main/samples/HEARME.wav" type="audio/wav"></audio>
+⬆️ **Kokoro has been upgraded to v1.0!** See [Releases](https://huggingface.co/hexgrad/Kokoro-82M#releases).
+✨ You can now [`pip install kokoro`](https://github.com/hexgrad/kokoro)! See [Usage](https://huggingface.co/hexgrad/Kokoro-82M#usage).
+- [Releases](#releases)
+- [Usage](#usage)
+- [SAMPLES.md](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/SAMPLES.md) ↗️
+- [VOICES.md](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md) ↗️
+- [Model Facts](#model-facts)
+- [Training Details](#training-details)
+- [Creative Commons Attribution](#creative-commons-attribution)
+- [Acknowledgements](#acknowledgements)
+### Releases
+| Model | Published | Training Data | Langs & Voices | SHA256 |
+| ----- | --------- | ------------- | -------------- | ------ |
+| [v0.19](https://huggingface.co/hexgrad/kLegacy/tree/main/v0.19) | 2024 Dec 25 | <100 hrs | 1 & 10 | `3b0c392f` |
+| **v1.0** | **2025 Jan 27** | **Few hundred hrs** | [**8 & 54**](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md) | `496dba11` |
+| Training Costs | v0.19 | v1.0 | **Total** |
+| -------------- | ----- | ---- | ----- |
+| in A100 80GB GPU hours | 500 | 500 | **1000** |
+| average hourly rate | $0.80/h | $1.20/h | **$1/h** |
+| in USD | $400 | $600 | **$1000** |
+### Usage
+[`pip install kokoro`](https://pypi.org/project/kokoro/) installs the inference library at https://github.com/hexgrad/kokoro
+You can run this cell on [Google Colab](https://colab.research.google.com/). [Listen to samples](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/SAMPLES.md).
+```py
+# 1️⃣ Install kokoro
+!pip install -q kokoro>=0.3.4 soundfile
+# 2️⃣ Install espeak, used for English OOD fallback and some non-English languages
+!apt-get -qq -y install espeak-ng > /dev/null 2>&1
+# 🇪🇸 'e' => Spanish es
+# 🇫🇷 'f' => French fr-fr
+# 🇮🇳 'h' => Hindi hi
+# 🇮🇹 'i' => Italian it
+# 🇧🇷 'p' => Brazilian Portuguese pt-br
+# 3️⃣ Initalize a pipeline
+from kokoro import KPipeline
+from IPython.display import display, Audio
+import soundfile as sf
+# 🇺🇸 'a' => American English, 🇬🇧 'b' => British English
+# 🇯🇵 'j' => Japanese: pip install misaki[ja]
+# 🇨🇳 'z' => Mandarin Chinese: pip install misaki[zh]
+pipeline = KPipeline(lang_code='a') # <= make sure lang_code matches voice
+# This text is for demonstration purposes only, unseen during training
+text = '''
+The sky above the port was the color of television, tuned to a dead channel.
+"It's not like I'm using," Case heard someone say, as he shouldered his way through the crowd around the door of the Chat. "It's like my body's developed this massive drug deficiency."
+It was a Sprawl voice and a Sprawl joke. The Chatsubo was a bar for professional expatriates; you could drink there for a week and never hear two words in Japanese.
+These were to have an enormous impact, not only because they were associated with Constantine, but also because, as in so many other areas, the decisions taken by Constantine (or in his name) were to have great significance for centuries to come. One of the main issues was the shape that Christian churches were to take, since there was not, apparently, a tradition of monumental church buildings when Constantine decided to help the Christian church build a series of truly spectacular structures. The main form that these churches took was that of the basilica, a multipurpose rectangular structure, based ultimately on the earlier Greek stoa, which could be found in most of the great cities of the empire. Christianity, unlike classical polytheism, needed a large interior space for the celebration of its religious services, and the basilica aptly filled that need. We naturally do not know the degree to which the emperor was involved in the design of new churches, but it is tempting to connect this with the secular basilica that Constantine completed in the Roman forum (the so-called Basilica of Maxentius) and the one he probably built in Trier, in connection with his residence in the city at a time when he was still caesar.
+[Kokoro](/kˈOkəɹO/) is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, [Kokoro](/kˈOkəɹO/) can be deployed anywhere from production environments to personal projects.
+'''
+# text = '「もしおれがた���偶然、そしてこうしようというつもりでなくここに立っているのなら、ちょっとばかり絶望するところだな」と、そんなことが彼の頭に思い浮かんだ。'
+# text = '中國人民不信邪也不怕邪，不惹事也不怕事，任何外國不要指望我們會拿自己的核心利益做交易，不要指望我們會吞下損害我國主權、安全、發展利益的苦果！'
+# text = 'Los partidos políticos tradicionales compiten con los populismos y los movimientos asamblearios.'
+# text = 'Le dromadaire resplendissant déambulait tranquillement dans les méandres en mastiquant de petites feuilles vernissées.'
+# text = 'ट्रांसपोर्टरों की हड़ताल लगातार पांचवें दिन जारी, दिसंबर से इलेक्ट्रॉनिक टोल कलेक्शनल सिस्टम'
+# text = "Allora cominciava l'insonnia, o un dormiveglia peggiore dell'insonnia, che talvolta assumeva i caratteri dell'incubo."
+# text = 'Elabora relatórios de acompanhamento cronológico para as diferentes unidades do Departamento que propõem contratos.'
+# 4️⃣ Generate, display, and save audio files in a loop.
+generator = pipeline(
+    text, voice='af_heart', # <= change voice here
+    speed=1, split_pattern=r'\n+'
+)
+for i, (gs, ps, audio) in enumerate(generator):
+    print(i)  # i => index
+    print(gs) # gs => graphemes/text
+    print(ps) # ps => phonemes
+    display(Audio(data=audio, rate=24000, autoplay=i==0))
+    sf.write(f'{i}.wav', audio, 24000) # save each audio file
+```
+Under the hood, `kokoro` uses [`misaki`](https://pypi.org/project/misaki/), a G2P library at https://github.com/hexgrad/misaki
+### Model Facts
+**Architecture:**
+- StyleTTS 2: https://arxiv.org/abs/2306.07691
+- ISTFTNet: https://arxiv.org/abs/2203.02395
+- Decoder only: no diffusion, no encoder release
+**Architected by:** Li et al @ https://github.com/yl4579/StyleTTS2
+**Trained by**: `@rzvzn` on Discord
+**Languages:** American English, British English, French, Hindi
+**Model SHA256 Hash:** `496dba118d1a58f5f3db2efc88dbdc216e0483fc89fe6e47ee1f2c53f18ad1e4`
+### Training Details
+**Data:** Kokoro was trained exclusively on **permissive/non-copyrighted audio data** and IPA phoneme labels. Examples of permissive/non-copyrighted audio include:
+- Public domain audio
+- Audio licensed under Apache, MIT, etc
+- Synthetic audio<sup>[1]</sup> generated by closed<sup>[2]</sup> TTS models from large providers<br/>
+[1] https://copyright.gov/ai/ai_policy_guidance.pdf<br/>
+[2] No synthetic audio from open TTS models or "custom voice clones"
+**Total Dataset Size:** A few hundred hours of audio
+**Total Training Cost:** About $1000 for 1000 hours of A100 80GB vRAM
+### Creative Commons Attribution
+The following CC BY audio was part of the dataset used to train Kokoro v1.0.
+| Audio Data | Duration Used | License | Added to Training Set After |
+| ---------- | ------------- | ------- | --------------------------- |
+| [Koniwa](https://github.com/koniwa/koniwa) `tnc` | <1h | [CC BY 3.0](https://creativecommons.org/licenses/by/3.0/deed.ja) | v0.19 / 22 Nov 2024 |
+| [SIWIS](https://datashare.ed.ac.uk/handle/10283/2353) | <11h | [CC BY 4.0](https://datashare.ed.ac.uk/bitstream/handle/10283/2353/license_text) | v0.19 / 22 Nov 2024 |
+### Acknowledgements
+- 🛠️ [@yl4579](https://huggingface.co/yl4579) for architecting StyleTTS 2.
+- 🏆 [@Pendrokar](https://huggingface.co/Pendrokar) for adding Kokoro as a contender in the TTS Spaces Arena.
+- 📊 Thank you to everyone who contributed synthetic training data.
+- ❤️ Special thanks to all compute sponsors.
+- 👾 Discord server: https://discord.gg/QuGxSWBfQy
+- 🪽 Kokoro is a Japanese word that translates to "heart" or "spirit". Kokoro is also the name of an [AI in the Terminator franchise](https://terminator.fandom.com/wiki/Kokoro).
+<img src="https://static0.gamerantimages.com/wordpress/wp-content/uploads/2024/08/terminator-zero-41-1.jpg" width="400" alt="kokoro" />

SAMPLES.md ADDED Viewed

	@@ -0,0 +1,48 @@

+### HEARME
+<audio controls><source src="https://huggingface.co/hexgrad/Kokoro-82M/resolve/main/samples/HEARME.wav" type="audio/wav"></audio>
+> Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
+```
+kˈOkəɹO ɪz ɐn ˈOpᵊnwˌAt tˌitˌiˈɛs mˈɑdᵊl wɪð ˈATi tˈu mˈɪljᵊn pəɹˈæməTəɹz. dəspˈIt ɪts lˈItwˌAt ˈɑɹkətˌɛkʧəɹ, ɪt dəlˈɪvəɹz kˈɑmpəɹəbᵊl kwˈɑləTi tə lˈɑɹʤəɹ mˈɑdᵊlz wˌIl bˈiɪŋ səɡnˈɪfəkəntli fˈæstəɹ ænd mˈɔɹ kˈɔstəfˌɪʃənt. wˌɪð əpˌæʧilˈIsᵊnst wˈAts, kˈOkəɹO kæn bi dəplˈYd ˈɛniwˌɛɹ fɹʌm pɹədˈʌkʃən ənvˈIɹənmᵊnts tə pˈɜɹsᵊnəl pɹˈɑʤˌɛkts.
+```
+### af_heart_0
+<audio controls><source src="https://huggingface.co/hexgrad/Kokoro-82M/resolve/main/samples/af_heart_0.wav" type="audio/wav"></audio>
+> The sky above the port was the color of television, tuned to a dead channel.
+```
+ðə skˈI əbˈʌv ðə pˈɔɹt wʌz ðə kˈʌləɹ ʌv tˈɛləvˌɪʒən, tˈund tə ɐ dˈɛd ʧˈænᵊl.
+```
+### af_heart_1
+<audio controls><source src="https://huggingface.co/hexgrad/Kokoro-82M/resolve/main/samples/af_heart_1.wav" type="audio/wav"></audio>
+> "It's not like I'm using," Case heard someone say, as he shouldered his way through the crowd around the door of the Chat. "It's like my body's developed this massive drug deficiency."
+```
+“ˌɪts nˌɑt lˈIk ˌIm jˈuzɪŋ,” kˈAs hˈɜɹd sˈʌmwˌʌn sˈA, æz hi ʃˈOldəɹd hɪz wˈA θɹu ðə kɹˈWd əɹˈWnd ðə dˈɔɹ ʌv ðə ʧˈæt. “ˌɪts lˈIk mI bˈɑdiz dəvˈɛləpt ðɪs mˈæsɪv dɹˈʌɡ dəfˈɪʃənsi.”
+```
+### af_heart_2
+<audio controls><source src="https://huggingface.co/hexgrad/Kokoro-82M/resolve/main/samples/af_heart_2.wav" type="audio/wav"></audio>
+> It was a Sprawl voice and a Sprawl joke. The Chatsubo was a bar for professional expatriates; you could drink there for a week and never hear two words in Japanese.
+```
+ˌɪt wʌz ɐ spɹˈɔl vˈYs ænd ɐ spɹˈɔl ʤˈOk. ðə ʧætsˈubO wʌz ɐ bˈɑɹ fɔɹ pɹəfˈɛʃᵊnəl ɛkspˈAtɹiəts; ju kʊd dɹˈɪŋk ðɛɹ fɔɹ ɐ wˈik ænd nˈɛvəɹ hˈɪɹ tˈu wˈɜɹdz ɪn ʤˌæpənˈiz.
+```
+### af_heart_3
+<audio controls><source src="https://huggingface.co/hexgrad/Kokoro-82M/resolve/main/samples/af_heart_3.wav" type="audio/wav"></audio>
+> These were to have an enormous impact, not only because they were associated with Constantine, but also because, as in so many other areas, the decisions taken by Constantine (or in his name) were to have great significance for centuries to come. One of the main issues was the shape that Christian churches were to take, since there was not, apparently, a tradition of monumental church buildings when Constantine decided to help the Christian church build a series of truly spectacular structures.
+```
+ðˌiz wɜɹ tə hæv ɐn ɪnˈɔɹməs ˈɪmpˌækt, nˌɑt ˈOnli bəkˈʌz ðA wɜɹ əsˈOsiˌATᵻd wɪð kˈɑnstəntˌin, bˌʌt ˈɔlsO bəkˈʌz, æz ɪn sˌO mˈɛni ˈʌðəɹ ˈɛɹiəz, ðə dəsˈɪʒᵊnz tˈAkən bI kˈɑnstəntˌin (ɔɹ ɪn hɪz nˈAm) wɜɹ tə hæv ɡɹˈAt səɡnˈɪfəkᵊns fɔɹ sˈɛnʧəɹiz tə kˈʌm. wˈʌn ʌv ðə mˈAn ˈɪʃjuz wʌz ðə ʃˈAp ðæt kɹˈɪsʧən ʧˈɜɹʧᵻz wɜɹ tə tˈAk, sˈɪns ðɛɹ wʌz nˌɑt, əpˈɛɹəntli, ɐ tɹədˈɪʃən ʌv mˌɑnjəmˈɛntᵊl ʧˈɜɹʧ bˈɪldɪŋz wˌɛn kˈɑnstəntˌin dəsˈIdᵻd tə hˈɛlp ðə kɹˈɪsʧən ʧˈɜɹʧ bˈɪld ɐ sˈɪɹiz ʌv tɹˈuli spɛktˈækjələɹ stɹˈʌkʧəɹz.
+```
+### af_heart_4
+<audio controls><source src="https://huggingface.co/hexgrad/Kokoro-82M/resolve/main/samples/af_heart_4.wav" type="audio/wav"></audio>
+> The main form that these churches took was that of the basilica, a multipurpose rectangular structure, based ultimately on the earlier Greek stoa, which could be found in most of the great cities of the empire. Christianity, unlike classical polytheism, needed a large interior space for the celebration of its religious services, and the basilica aptly filled that need.
+```
+ðə mˈAn fˈɔɹm ðæt ðiz ʧˈɜɹʧᵻz tˈʊk wʌz ðæt ʌv ðə bəsˈɪləkə, ɐ mˌʌltipˈɜɹpəs ɹɛktˈæŋɡjələɹ stɹˈʌkʧəɹ, bˈAst ˈʌltəmətli ˌɔn ði ˈɜɹliəɹ ɡɹˈik stˈOə, wˌɪʧ kʊd bi fˈWnd ɪn mˈOst ʌv ðə ɡɹˈAt sˈɪTiz ʌv ði ˈɛmpˌIəɹ. kɹˌɪsʧiˈænəTi, ˌʌnlˈIk klˈæsəkᵊl pˈɑliθiˌɪzəm, nˈidᵻd ɐ lˈɑɹʤ ɪntˈɪɹiəɹ spˈAs fɔɹ ðə sˌɛləbɹˈAʃən ʌv ɪts ɹəlˈɪʤəs sˈɜɹvəsᵻz, ænd ðə bəsˈɪləkə ˈæptli fˈɪld ðæt nˈid.
+```
+### af_heart_5
+<audio controls><source src="https://huggingface.co/hexgrad/Kokoro-82M/resolve/main/samples/af_heart_5.wav" type="audio/wav"></audio>
+> We naturally do not know the degree to which the emperor was involved in the design of new churches, but it is tempting to connect this with the secular basilica that Constantine completed in the Roman forum (the so-called Basilica of Maxentius) and the one he probably built in Trier, in connection with his residence in the city at a time when he was still caesar.
+```
+wˌi nˈæʧəɹəli dˈu nˌɑt nˈO ðə dəɡɹˈi tə wˌɪʧ ði ˈɛmpəɹəɹ wʌz ɪnvˈɑlvd ɪn ðə dəzˈIn ʌv nˈu ʧˈɜɹʧᵻz, bˌʌt ɪt ɪz tˈɛmptɪŋ tə kənˈɛkt ðɪs wɪð ðə sˈɛkjələɹ bəsˈɪləkə ðæt kˈɑnstəntˌin kəmplˈiTᵻd ɪn ðə ɹˈOmən fˈɔɹəm (ðə sˌOkˈɔld bəsˈɪləkə ʌv mæksˈɛntiəs) ænd ðə wˈʌn hi pɹˈɑbəbli bˈɪlt ɪn tɹˈɪɹ, ɪn kənˈɛkʃən wɪð hɪz ɹˈɛzədᵊns ɪn ðə sˈɪTi æt ɐ tˈIm wˌɛn hi wʌz stˈɪl sˈizəɹ.
+```

VOICES.md ADDED Viewed

	@@ -0,0 +1,161 @@

+# Voices
+- 🇺🇸 [American English](#american-english): 11F 9M
+- 🇬🇧 [British English](#british-english): 4F 4M
+- 🇯🇵 [Japanese](#japanese): 4F 1M
+- 🇨🇳 [Mandarin Chinese](#mandarin-chinese): 4F 4M
+- 🇪🇸 [Spanish](#spanish): 1F 2M
+- 🇫🇷 [French](#french): 1F
+- 🇮🇳 [Hindi](#hindi): 2F 2M
+- 🇮🇹 [Italian](#italian): 1F 1M
+- 🇧🇷 [Brazilian Portuguese](#brazilian-portuguese): 1F 2M
+For each voice, the given grades are intended to be estimates of the **quality and quantity** of its associated training data, both of which impact overall inference quality.
+Subjectively, voices will sound better or worse to different people.
+Support for non-English languages may be absent or thin due to weak G2P and/or lack of training data. Some languages are only represented by a small handful or even just one voice (French).
+Most voices perform best on a "goldilocks range" of 100-200 tokens out of ~500 possible. Voices may perform worse at the extremes:
+- **Weakness** on short utterances, especially less than 10-20 tokens. Root cause could be lack of short-utterance training data and/or model architecture. One possible inference mitigation is to bundle shorter utterances together.
+- **Rushing** on long utterances, especially over 400 tokens. You can chunk down to shorter utterances or adjust the `speed` parameter to mitigate this.
+**Target Quality**
+- How high quality is the reference voice? This grade may be impacted by audio quality, artifacts, compression, & sample rate.
+- How well do the text labels match the audio? Text/audio misalignment (e.g. from hallucinations) will lower this grade.
+**Training Duration**
+- How much audio was seen during training? Smaller durations result in a lower overall grade.
+- 10 hours <= **HH hours** < 100 hours
+- 1 hour <= H hours < 10 hours
+- 10 minutes <= MM minutes < 100 minutes
+- 1 minute <= _M minutes_ 🤏 < 10 minutes
+### American English
+- `lang_code='a'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
+- espeak-ng `en-us` fallback
+| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
+| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
+| **af\_heart** | 🚺❤️ | | | **A** | `0ab5709b` |
+| af_alloy | 🚺 | B | MM minutes | C | `6d877149` |
+| af_aoede | 🚺 | B | H hours | C+ | `c03bd1a4` |
+| af_bella | 🚺🔥 | **A** | **HH hours** | **A-** | `8cb64e02` |
+| af_jessica | 🚺 | C | MM minutes | D | `cdfdccb8` |
+| af_kore | 🚺 | B | H hours | C+ | `8bfbc512` |
+| af_nicole | 🚺🎧 | B | **HH hours** | B- | `c5561808` |
+| af_nova | 🚺 | B | MM minutes | C | `e0233676` |
+| af_river | 🚺 | C | MM minutes | D | `e149459b` |
+| af_sarah | 🚺 | B | H hours | C+ | `49bd364e` |
+| af_sky | 🚺 | B | _M minutes_ 🤏 | C- | `c799548a` |
+| am_adam | 🚹 | D | H hours | F+ | `ced7e284` |
+| am_echo | 🚹 | C | MM minutes | D | `8bcfdc85` |
+| am_eric | 🚹 | C | MM minutes | D | `ada66f0e` |
+| am_fenrir | 🚹 | B | H hours | C+ | `98e507ec` |
+| am_liam | 🚹 | C | MM minutes | D | `c8255075` |
+| am_michael | 🚹 | B | H hours | C+ | `9a443b79` |
+| am_onyx | 🚹 | C | MM minutes | D | `e8452be1` |
+| am_puck | 🚹 | B | H hours | C+ | `dd1d8973` |
+| am_santa | 🚹 | C | _M minutes_ 🤏 | D- | `7f2f7582` |
+### British English
+- `lang_code='b'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
+- espeak-ng `en-gb` fallback
+| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
+| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
+| bf_alice | 🚺 | C | MM minutes | D | `d292651b` |
+| bf_emma | 🚺 | B | **HH hours** | B- | `d0a423de` |
+| bf_isabella | 🚺 | B | MM minutes | C | `cdd4c370` |
+| bf_lily | 🚺 | C | MM minutes | D | `6e09c2e4` |
+| bm_daniel | 🚹 | C | MM minutes | D | `fc3fce4e` |
+| bm_fable | 🚹 | B | MM minutes | C | `d44935f3` |
+| bm_george | 🚹 | B | MM minutes | C | `f1bc8122` |
+| bm_lewis | 🚹 | C | H hours | D+ | `b5204750` |
+### Japanese
+- `lang_code='j'` in [`misaki[ja]`](https://github.com/hexgrad/misaki)
+- Total Japanese training data: H hours
+| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 | CC BY |
+| ---- | ------ | -------------- | ----------------- | ------------- | ------ | ----- |
+| jf_alpha | 🚺 | B | H hours | C+ | `1bf4c9dc` | |
+| jf_gongitsune | 🚺 | B | MM minutes | C | `1b171917` | [gongitsune](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__gongitsune.txt) |
+| jf_nezumi | 🚺 | B | _M minutes_ 🤏 | C- | `d83f007a` | [nezuminoyomeiri](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__nezuminoyomeiri.txt) |
+| jf_tebukuro | 🚺 | B | MM minutes | C | `0d691790` | [tebukurowokaini](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__tebukurowokaini.txt) |
+| jm_kumo | 🚹 | B | _M minutes_ 🤏 | C- | `98340afd` | [kumonoito](https://github.com/koniwa/koniwa/blob/master/source/tnc/tnc__kumonoito.txt) |
+### Mandarin Chinese
+- `lang_code='z'` in [`misaki[zh]`](https://github.com/hexgrad/misaki)
+- Total Mandarin Chinese training data: H hours
+| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
+| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
+| zf_xiaobei | 🚺 | C | MM minutes | D | `9b76be63` |
+| zf_xiaoni | 🚺 | C | MM minutes | D | `95b49f16` |
+| zf_xiaoxiao | 🚺 | C | MM minutes | D | `cfaf6f2d` |
+| zf_xiaoyi | 🚺 | C | MM minutes | D | `b5235dba` |
+| zm_yunjian | 🚹 | C | MM minutes | D | `76cbf8ba` |
+| zm_yunxi | 🚹 | C | MM minutes | D | `dbe6e1ce` |
+| zm_yunxia | 🚹 | C | MM minutes | D | `bb2b03b0` |
+| zm_yunyang | 🚹 | C | MM minutes | D | `5238ac22` |
+### Spanish
+- `lang_code='e'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
+- espeak-ng `es`
+| Name | Traits | SHA256 |
+| ---- | ------ | ------ |
+| ef_dora | 🚺 | `d9d69b0f` |
+| em_alex | 🚹 | `5eac53f7` |
+| em_santa | 🚹 | `aa8620cb` |
+### French
+- `lang_code='f'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
+- espeak-ng `fr-fr`
+- Total French training data: <11 hours
+| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 | CC BY |
+| ---- | ------ | -------------- | ----------------- | ------------- | ------ | ----- |
+| ff_siwis | 🚺 | B | <11 hours | B- | `8073bf2d` | [SIWIS](https://datashare.ed.ac.uk/handle/10283/2353) |
+### Hindi
+- `lang_code='h'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
+- espeak-ng `hi`
+- Total Hindi training data: H hours
+| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
+| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
+| hf_alpha | 🚺 | B | MM minutes | C | `06906fe0` |
+| hf_beta | 🚺 | B | MM minutes | C | `63c0a1a6` |
+| hm_omega | 🚹 | B | MM minutes | C | `b55f02a8` |
+| hm_psi | 🚹 | B | MM minutes | C | `2f0f055c` |
+### Italian
+- `lang_code='i'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
+- espeak-ng `it`
+- Total Italian training data: H hours
+| Name | Traits | Target Quality | Training Duration | Overall Grade | SHA256 |
+| ---- | ------ | -------------- | ----------------- | ------------- | ------ |
+| if_sara | 🚺 | B | MM minutes | C | `6c0b253b` |
+| im_nicola | 🚹 | B | MM minutes | C | `234ed066` |
+### Brazilian Portuguese
+- `lang_code='p'` in [`misaki[en]`](https://github.com/hexgrad/misaki)
+- espeak-ng `pt-br`
+| Name | Traits | SHA256 |
+| ---- | ------ | ------ |
+| pf_dora | 🚺 | `07e4ff98` |
+| pm_alex | 🚹 | `cf0ba8c5` |
+| pm_santa | 🚹 | `d4210316` |

config.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"istftnet": {"upsample_kernel_sizes": [20, 12], "upsample_rates": [10, 6], "gen_istft_hop_size": 5, "gen_istft_n_fft": 20, "resblock_dilation_sizes": [[1, 3, 5], [1, 3, 5], [1, 3, 5]], "resblock_kernel_sizes": [3, 7, 11], "upsample_initial_channel": 512}, "dim_in": 64, "dropout": 0.2, "hidden_dim": 512, "max_conv_dim": 512, "max_dur": 50, "multispeaker": true, "n_layer": 3, "n_mels": 80, "n_token": 178, "style_dim": 128, "text_encoder_kernel_size": 5, "plbert": {"hidden_size": 768, "num_attention_heads": 12, "intermediate_size": 2048, "max_position_embeddings": 512, "num_hidden_layers": 12, "dropout": 0.1}, "vocab": {";": 1, ":": 2, ",": 3, ".": 4, "!": 5, "?": 6, "\u2014": 9, "\u2026": 10, "\"": 11, "(": 12, ")": 13, "\u201c": 14, "\u201d": 15, " ": 16, "\u0303": 17, "\u02a3": 18, "\u02a5": 19, "\u02a6": 20, "\u02a8": 21, "\u1d5d": 22, "\uab67": 23, "A": 24, "I": 25, "O": 31, "Q": 33, "S": 35, "T": 36, "W": 39, "Y": 41, "\u1d4a": 42, "a": 43, "b": 44, "c": 45, "d": 46, "e": 47, "f": 48, "h": 50, "i": 51, "j": 52, "k": 53, "l": 54, "m": 55, "n": 56, "o": 57, "p": 58, "q": 59, "r": 60, "s": 61, "t": 62, "u": 63, "v": 64, "w": 65, "x": 66, "y": 67, "z": 68, "\u0251": 69, "\u0250": 70, "\u0252": 71, "\u00e6": 72, "\u03b2": 75, "\u0254": 76, "\u0255": 77, "\u00e7": 78, "\u0256": 80, "\u00f0": 81, "\u02a4": 82, "\u0259": 83, "\u025a": 85, "\u025b": 86, "\u025c": 87, "\u025f": 90, "\u0261": 92, "\u0265": 99, "\u0268": 101, "\u026a": 102, "\u029d": 103, "\u026f": 110, "\u0270": 111, "\u014b": 112, "\u0273": 113, "\u0272": 114, "\u0274": 115, "\u00f8": 116, "\u0278": 118, "\u03b8": 119, "\u0153": 120, "\u0279": 123, "\u027e": 125, "\u027b": 126, "\u0281": 128, "\u027d": 129, "\u0282": 130, "\u0283": 131, "\u0288": 132, "\u02a7": 133, "\u028a": 135, "\u028b": 136, "\u028c": 138, "\u0263": 139, "\u0264": 140, "\u03c7": 142, "\u028e": 143, "\u0292": 147, "\u0294": 148, "\u02c8": 156, "\u02cc": 157, "\u02d0": 158, "\u02b0": 162, "\u02b2": 164, "\u2193": 169, "\u2192": 171, "\u2197": 172, "\u2198": 173, "\u1d7b": 177}, "quantization": {"group_size": 64, "bits": 8}}

kokoro-v1_0.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:496dba118d1a58f5f3db2efc88dbdc216e0483fc89fe6e47ee1f2c53f18ad1e4
+size 327212226

kokoro-v1_0.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fb2b8f22b906f2e70e7dae4c337457784b079fb44142d8f4da93d8a9ace905ed
+size 289324650

samples/HEARME.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aad38e96fa60c91c995ac820ce6e86c28b0df7300177c0d3ca0766b9dc78feec
+size 996044

samples/af_heart_0.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd7999ebbc8369779d5d3f504399ea466c909339f90231143416a7819a2047fc
+size 237644

samples/af_heart_1.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2fe4f363b785fdc233f94dd1885c94a2267f7ceeea8c7fb5cce6bfcf0f7b273d
+size 517244

samples/af_heart_2.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:68ee75b2d503415a5b6edbd5230c823fbeb6b430d546b8c37e2284efcf280be8
+size 496844

samples/af_heart_3.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e758efcd852e77569772ee5a424df724695903db61a323a6021ee1c6a50ca616
+size 1407644

samples/af_heart_4.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d50c90f447686200052f4375dee55a3406ec0aa140473cf946e98fdfe860989b
+size 1116044

samples/af_heart_5.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bc4515b9479c219e4e9347463a859ea6716dd0eb45a520f5f42825a7662b5054
+size 1033244

voices/af_alloy.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6d877149dd8b348fbad12e5845b7e43d975390e9f3b68a811d1d86168bef5aa3
+size 523425

voices/af_aoede.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c03bd1a4c3716c2d8eaa3d50022f62d5c31cfbd6e15933a00b17fefe13841cc4
+size 523425

voices/af_bella.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8cb64e02fcc8de0327a8e13817e49c76c945ecf0052ceac97d3081480e8e48d6
+size 523425

voices/af_heart.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0ab5709b8ffab19bfd849cd11d98f75b60af7733253ad0d67b12382a102cb4ff
+size 523425

voices/af_jessica.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cdfdccb8cc975aa34ee6b89642963b0064237675de0e41a30ae64cc958dd4e87
+size 523435

voices/af_kore.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8bfbc512321c3db49dff984ac675fa5ac7eaed5a96cc31104d3a9080e179d69d
+size 523420

voices/af_nicole.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c5561808bcf5250fe8c5f5de32caf2d94f27e57e95befdb098c5c85991d4c5da
+size 523430

voices/af_nova.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e0233676ddc21908c37a1f102f6b88a59e4e5c1bd764983616eb9eda629dbcd2
+size 523420

voices/af_river.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e149459bd9c084416b74756b9bd3418256a8b839088abb07d463730c369dab8f
+size 523425

voices/af_sarah.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:49bd364ea3be9eb3e9685e8f9a15448c4883112a7c0ff7ab139fa4088b08cef9
+size 523425

voices/af_sky.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c799548aed06e0cb0d655a85a01b48e7f10484d71663f9a3045a5b9362e8512c
+size 523351

voices/am_adam.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ced7e284aba12472891be1da3ab34db84cc05cc02b5889535796dbf2d8b0cb34
+size 523420

voices/am_echo.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8bcfdc852bc985fb45c396c561e571ffb9183930071f962f1b50df5c97b161e8
+size 523420

voices/am_eric.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ada66f0eefff34ec921b1d7474d7ac8bec00cd863c170f1c534916e9b8212aae
+size 523420

voices/am_fenrir.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:98e507eca1db08230ae3b6232d59c10aec9630022d19accac4f5d12fcec3c37a
+size 523430

voices/am_liam.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c82550757ddb31308b97f30040dda8c2d609a9e2de6135848d0a948368138518
+size 523420

voices/am_michael.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9a443b79a4b22489a5b0ab7c651a0bcd1a30bef675c28333f06971abbd47bd37
+size 523435

voices/am_onyx.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e8452be16cd0f6da7b4579eaf7b1e4506e92524882053d86d72b96b9a7fed584
+size 523420

voices/am_puck.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd1d8973f4ce4b7d8ae407c77a435f485dabc052081b80ea75c4f30b84f36223
+size 523420

voices/am_santa.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f2f7582fa2b1f160e90aafe6d0b442a685e773608b6667e545d743b073e97a7
+size 523425

voices/bf_alice.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d292651b6af6c0d81705c2580dcb4463fccc0ff7b8d618a471dbb4e45655b3f3
+size 523425

voices/bf_emma.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d0a423deabf4a52b4f49318c51742c54e21bb89bbbe9a12141e7758ddb5da701
+size 523420

voices/bf_isabella.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cdd4c37003805104d1d08fb1e05855c8fb2c68de24ca6e71f264a30aaa59eefd
+size 523440

voices/bf_lily.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6e09c2e481e2d53004d7e5ae7d3a325369e130a6f45c35a6002de75084be9285
+size 523420

voices/bm_daniel.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fc3fce4e9c12ed4dbc8fa9680cfe51ee190a96444ce7c3ad647549a30823fc5d
+size 523430

voices/bm_fable.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d44935f3135257a9064df99f007fc1342ff1aa767552b4a4fa4c3b2e6e59079c
+size 523425

voices/bm_george.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f1bc812213dc59774769e5c80004b13eeb79bd78130b11b2d7f934542dab811b
+size 523430

voices/bm_lewis.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b5204750dcba01029d2ac9cec17aec3b20a6d64073c579d694a23cb40effbd0e
+size 523425

voices/ef_dora.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d9d69b0f8a2b87a345f269d89639f89dfbd1a6c9da0c498ae36dd34afcf35530
+size 523420

voices/em_alex.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5eac53f767c3f31a081918ba531969aea850bed18fe56419b804d642c6973431
+size 523420

voices/em_santa.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aa8620cb96cec705823efca0d956a63e158e09ad41aca934d354b7f0778f63cb
+size 523430

voices/ff_siwis.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8073bf2d2c4b9543a90f2f0fd2144de4ed157e2d4b79ddeb0d5123066171fbc9
+size 523425

voices/hf_alpha.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:06906fe05746d13a79c5c01e21fd7233b05027221a933c9ada650f5aafc8f044
+size 523425

voices/hf_beta.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:63c0a1a6272e98d43f4511bba40e30dd9c8ceaf5f39af869509b9f51a71c503e
+size 523420

voices/hm_omega.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b55f02a8e8483fffe0afa566e7d22ed8013acf47ad4f6bbee2795a840155703e
+size 523425