metadata

license: mit

hubert-base-jtube

This repo provides model weights for the hubert-base model trained on the JTubeSpeech corpus.

Dataset

We extracted approximately 2720 hours of Japanese speech from the single-speaker subset of the JTubeSpeech corpus. The training data includes approximately 6,000,000 utterances from a total of about 55,000 speakers.

Contributors

中田亘
関健太郎
谷中瞳
佐伯高明
齋藤佑樹
高道慎之介

謝辞/acknowledgements

本研究は、国立研究開発法人産業技術総合研究所事業の令和5年度覚醒プロジェクトの助成を受けたものです。 /This work was supported by AIST KAKUSEI project (FY2023).