Prompt wav should be fixed to args.prompt_wav

#2
by petronny - opened

Hi, I notice that currently the model always mimickes the timbre of the last voice input and raises error when using only text input.

It looks like such behavior comes from https://huggingface.co/spaces/Steveeeeeeen/Step-Audio-2-mini/blob/main/app.py#L175, where the prompt wav is changed.

The audio input box is designed for voice input instead of uploading prompt wav. Could please you fix the prompt_wav to args.prompt_wav?
The expected behavior is that the model always responses in the default timbre with either in voice or text inputs.

Sign up or log in to comment