ivrit-ai/faster-whisper-v2-d4 · What are the best practices for handling long audio of 2-3 hours?

Dec 12, 2024

1st thanks for the great work. you are awesome!

I was experimenting with the model with very long files that are decent quality. (far from studio recording though).

the model runing locally on docker desktop with gpu accelerated and default setting like the model readme suggests.

the translation is ok-is, but i wasnt doing anything special. is there any idea how to make the most of it?
Thanks

Some things I've done already.
I was experimenting with faster-whisper github and used some tweaks with version 1.1.0 without significant improvements.

setting hotwords choke the model with some internal limitation of 448 tokens.
initial promp also used without issues running it, but with results that needs improvements.

liorz

16 days ago

cut it to managable chunks

benderrodriguez

ivrit.ai org 16 days ago

We are using faster whisper.
If you run it with runpod (https://github.com/ivrit-ai/runpod-serverless) it is quick.

Mbellish

9 days ago

cut it to manageable chunks

how does one decide the "managable-size"? experimental?
afaik, 25mb is openai recommendations\support.
should it be less\more?

also, i would like to supply "hot words" but it seems to have 448 tokens limit.

what is the best method to pass larger hot words with the prompt?
currently i pass them using the system prompt