Commit History

Enhance speaker assignment in transcription: Introduced interval overlap calculations and smoothing techniques for improved accuracy in speaker labeling. Added methods for determining dominant speakers and stabilizing segment boundaries.
f800f63

liuyang commited on

enable result printing and comment out text cleanup regex
aa984fe

liuyang commited on

model control
caef0e2

liuyang commited on

remove prompt
744c18a

liuyang commited on

update prompt
ceb7ebf

liuyang commited on

add prompt
c543860

liuyang commited on

add prompt
c59adf8

liuyang commited on

fix bug
2861a47

liuyang commited on

disable batch
6a522dd

liuyang commited on

speech duration
2a0543d

liuyang commited on

less speech duration
6159f83

liuyang commited on

enable vad
51ab2c6

liuyang commited on

Refine VAD parameters and transcription options in WhisperTranscriber for improved audio processing. Adjust max speech duration, min speech duration, and silence duration, and set chunk length to 12 seconds.
5411f5d

liuyang commited on

update log, no vad
9f7c374

liuyang commited on

print log
d947708

liuyang commited on

add log
04482af

liuyang commited on

disable checksum
24dd7ba

liuyang commited on

update
d5d2af9

liuyang commited on

fix upload issue
e84487c

liuyang commited on

upload data
731f4bf

liuyang commited on

Add job_id and task_id handling in WhisperTranscriber to improve metadata management during audio processing. Update file key generation for intermediate uploads.
5dddf57

liuyang commited on

fix
4417549

liuyang commited on

add upload
06b904d

liuyang commited on

Comment out debug print statements in WhisperTranscriber to clean up output during transcription and embedding calculations.
0724301

liuyang commited on

Enhance waveform processing in WhisperTranscriber by implementing zero-padding for short audio segments and refining segment slicing logic. Update comments for clarity on embedding requirements and fallback mechanisms.
1461032

liuyang commited on

Update waveform handling in WhisperTranscriber to maintain channel dimension during embedding calculations. Adjust comments for clarity on input shape requirements.
5a14daf

liuyang commited on

Add error handling in embedding calculation within WhisperTranscriber. Log exceptions and diarization segments for better debugging.
5b655f4

liuyang commited on

try add token
3c30a0a

liuyang commited on

try add token
baffcb8

liuyang commited on

add log
f4b8170

liuyang commited on

Implement audio preprocessing and speaker diarization enhancements in WhisperTranscriber. Introduce methods for audio chunk preparation, VAD-based trimming, and speaker embedding extraction. Update process_audio methods to utilize task JSON for improved workflow and metadata handling. Add webrtcvad dependency for voice activity detection.
70438f0

liuyang commited on

print result
7c60c3b

liuyang commited on

Add batched inference support in WhisperTranscriber for improved transcription performance. Update methods to accept batch size parameters and adjust output formatting accordingly.
7cf016f

liuyang commited on

Update transcription batch size to 24 in WhisperTranscriber for improved processing efficiency
aaba71b

liuyang commited on

Add model downloading functionality for Faster Whisper in app.py, enabling efficient local caching and improved model loading performance.
3ea9b86

liuyang commited on

Refactor model loading in app.py to return both Whisper and diarization models, enhancing GPU utilization during transcription processes.
28a7e7e

liuyang commited on

Refactor WhisperTranscriber to use pre-loaded models instead of loading them during transcription, improving performance and reducing overhead.
0cb30bb

liuyang commited on

Remove model downloading functionality from app.py, reverting to a fallback model name for initialization.
024a455

liuyang commited on

Update Whisper model repository reference in app.py to use OpenAI's whisper-large-v3-turbo
91b90b2

liuyang commited on

Add model downloading functionality and update GPU initialization process in app.py
41c92a1

liuyang commited on

init
2b27ee7

liuyang commited on

init without gpu
fb95829

liuyang commited on

int8
9d66376

liuyang commited on

Update Whisper model compute type to float16 and adjust transcription batch size to 24 for improved performance
6ffe750

liuyang commited on

Update Whisper model configuration to use int8 compute type for improved performance
8283fed

liuyang commited on

print result
5db1c04

liuyang commited on

Add full audio transcription functionality and update Gradio interface
8c68b8b

liuyang commited on

Enhance error handling for nvidia-cudnn-cu12 integration in app.py. Added checks for the presence of the library and improved loading mechanism with appropriate error messages.
d441278

liuyang commited on

refactor diarizer to remove FP16 model conversion and related configurations
ce51169

liuyang commited on

comment out FP16 model conversion in diarizer
c00855b

liuyang commited on