Xilin Jiang's picture

2 21 1

Xilin Jiang

xi-j

·

xi-j

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information

upvoted a paper 1 day ago

Unified Reward Model for Multimodal Understanding and Generation

authored a paper 14 days ago

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis

View all activity

Organizations

None yet

xi-j's activity

upvoted 2 papers 1 day ago

S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information

Paper • 2503.05085 • Published 5 days ago • 43

Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published 5 days ago • 95

authored 6 papers 14 days ago

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis

Paper • 2407.09732 • Published Jul 13, 2024 • 9

Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation

Paper • 2408.11849 • Published Aug 13, 2024

Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue

Paper • 2409.04927 • Published Sep 7, 2024

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion

Paper • 2409.10058 • Published Sep 16, 2024 • 2

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

Paper • 2309.09493 • Published Sep 18, 2023

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Paper • 2502.16794 • Published 16 days ago • 5

commented a paper 14 days ago

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Paper • 2502.16794 • Published 16 days ago • 5 •

upvoted a paper 14 days ago

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Paper • 2502.16794 • Published 16 days ago • 5

commented a paper 14 days ago

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Paper • 2502.16794 • Published 16 days ago • 5 •

upvoted a paper 14 days ago

Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published 21 days ago • 66

upvoted a paper about 1 month ago

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 66

upvoted 3 papers about 2 months ago

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published Jan 9 • 88

Enhancing Human-Like Responses in Large Language Models

Paper • 2501.05032 • Published Jan 9 • 50

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published Jan 10 • 48

upvoted 3 papers 5 months ago

UniMuMo: Unified Text, Music and Motion Generation

Paper • 2410.04534 • Published Oct 6, 2024 • 19

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171

Presto! Distilling Steps and Layers for Accelerating Music Generation

Paper • 2410.05167 • Published Oct 7, 2024 • 17

upvoted a paper 7 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 126