Zhisheng Zheng

zhisheng01

https://zhishengzheng.com/

zhisheng147

AI & ML interests

LLM, Speech and Audio Processing

Recent Activity

upvoted a paper 6 days ago

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

upvoted a paper 16 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

upvoted a paper 16 days ago

Slamming: Training a Speech Language Model on One GPU in a Day

View all activity

Organizations

None yet

zhisheng01's activity

upvoted a paper 6 days ago

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Paper • 2503.04724 • Published 7 days ago • 59

upvoted 2 papers 16 days ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published 16 days ago • 69

Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published 22 days ago • 66

upvoted a paper 22 days ago

Soundwave: Less is More for Speech-Text Alignment in LLMs

Paper • 2502.12900 • Published 23 days ago • 77

liked a dataset 28 days ago

baijs/AudioSetCaps

Preview • Updated Nov 27, 2024 • 335 • 18

liked 2 models about 1 month ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Text Generation • Updated 18 days ago • 1.59M • • 1.03k

deepseek-ai/DeepSeek-R1

Text Generation • Updated 18 days ago • 2.75M • • 11.3k

upvoted 2 papers about 1 month ago

AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting

Paper • 2502.05176 • Published Feb 7 • 32

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published Feb 6 • 25

liked a dataset about 1 month ago

CAiRE/ASCEND

Viewer • Updated Jul 16, 2024 • 12.3k • 1.48k • 33

upvoted an article about 1 month ago

Article

Recipe: Preparing Multilingual Speech Datasets for TTS Training

and 1 other •

Nov 4, 2024

• 18

upvoted a paper about 2 months ago

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published Jan 10 • 48

liked a model 2 months ago

deepseek-ai/DeepSeek-V3

Text Generation • Updated 18 days ago • 3.12M • • 3.63k

liked 2 models 4 months ago

nyrahealth/CrisperWhisper

Automatic Speech Recognition • Updated Dec 19, 2024 • 23.6k • • 245

kyutai/mimi

Feature Extraction • Updated Sep 18, 2024 • 162k • 108

liked a dataset 5 months ago

walkerhyf/NCSSD

Updated Nov 12, 2024 • 91 • 20

upvoted a paper 5 months ago

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 92

liked a model 5 months ago

SWivid/F5-TTS

Text-to-Speech • Updated 1 day ago • 937k • 941

updated a dataset 5 months ago

zhisheng01/SpatialAudio

Preview • Updated Oct 12, 2024 • 111 • 3

upvoted a paper 5 months ago

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published Oct 9, 2024 • 44