What are the best practices for handling long audio of 2-3 hours?
1st thanks for the great work. you are awesome!
I was experimenting with the model with very long files that are decent quality. (far from studio recording though).
the model runing locally on docker desktop with gpu accelerated and default setting like the model readme suggests.
the translation is ok-is, but i wasnt doing anything special. is there any idea how to make the most of it?
Thanks
Some things I've done already.
I was experimenting with faster-whisper github and used some tweaks with version 1.1.0 without significant improvements.
setting hotwords choke the model with some internal limitation of 448 tokens.
initial promp also used without issues running it, but with results that needs improvements.
cut it to managable chunks
We are using faster whisper.
If you run it with runpod (https://github.com/ivrit-ai/runpod-serverless) it is quick.
cut it to manageable chunks
how does one decide the "managable-size"? experimental?
afaik, 25mb is openai recommendations\support.
should it be less\more?
also, i would like to supply "hot words" but it seems to have 448 tokens limit.
what is the best method to pass larger hot words with the prompt?
currently i pass them using the system prompt