Control over output
#12
by
TeachableMachine
- opened
Hi,
Is there a way to control speaker selection?
Is the output limited to 30 seconds always?
You control the "speaker selection" by providing a different .wav
audio clip "prompt".
The output seems limited to 2048 "tokens", and in my testing I'm lucky to get 5 to 10 good seconds of output (depending on input clip quality etc).
I've seen this example of processing longer text here.
However, I'm not aware of any sampling params that could be used to give more consistent generations across multiple chunks. Is there some kind of "seed" or something to improve consistency perhaps?