EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper β’ 2409.10819 β’ Published Sep 17, 2024 β’ 18
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model Paper β’ 2406.04904 β’ Published Jun 7, 2024 β’ 6
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis Paper β’ 2404.19622 β’ Published Apr 30, 2024 β’ 2
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis Paper β’ 2404.19622 β’ Published Apr 30, 2024 β’ 2
Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm Paper β’ 2403.11781 β’ Published Mar 18, 2024 β’ 18
OLMo: Accelerating the Science of Language Models Paper β’ 2402.00838 β’ Published Feb 1, 2024 β’ 83
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis Paper β’ 2312.03491 β’ Published Dec 6, 2023 β’ 34
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation Paper β’ 2309.05455 β’ Published Sep 11, 2023
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis Paper β’ 2306.09417 β’ Published Jun 15, 2023 β’ 3
Prosody-controllable spontaneous TTS with neural HMMs Paper β’ 2211.13533 β’ Published Nov 24, 2022