MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published 4 days ago • 60
WavTokenizer-Medium-Large Collection https://arxiv.org/abs/2408.16532 • 4 items • Updated Feb 25 • 11
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published Aug 29, 2024 • 50
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Paper • 2503.04724 • Published 28 days ago • 68
PERSE: Personalized 3D Generative Avatars from A Single Portrait Paper • 2412.21206 • Published Dec 30, 2024 • 19
view article Article Transformers.js v3: WebGPU support, new models & tasks, and more… Oct 22, 2024 • 72
Phi-4 Collection Phi-4 family of small language and multi-modal models. • 7 items • Updated about 1 month ago • 111
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper • 2410.06885 • Published Oct 9, 2024 • 46
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published Feb 10 • 60
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated Feb 20 • 50
GeoPixel Collection Pixel Grounding Large Multimodal Model in Remote Sensing • 5 items • Updated Feb 26 • 1