Post
346
Step-Audio 2🔥 New end to end multimodal LLM for audio & speech, released by StepFun
stepfun-ai/step-audio-2-68b003c3a47b273fffaf67a8
✨ Direct raw audio: text & speech ,no ASR+LLM+TTS pipeline
✨ High-IQ reasoning: RL + CoT for paralinguistic cues
✨ Multimodal RAG + tool calling
✨ Emotion, timbre, dialect & style control
✨ SOTA on ASR, paralinguistic, speech dialog
stepfun-ai/step-audio-2-68b003c3a47b273fffaf67a8
✨ Direct raw audio: text & speech ,no ASR+LLM+TTS pipeline
✨ High-IQ reasoning: RL + CoT for paralinguistic cues
✨ Multimodal RAG + tool calling
✨ Emotion, timbre, dialect & style control
✨ SOTA on ASR, paralinguistic, speech dialog