Commits · nsfwalex/whisper-transcribe-new

Enhance speaker assignment in transcription: Introduced interval overlap calculations and smoothing techniques for improved accuracy in speaker labeling. Added methods for determining dominant speakers and stabilizing segment boundaries.

f800f63

liuyang commited on Sep 11

enable result printing and comment out text cleanup regex

aa984fe

liuyang commited on Sep 11

model control

caef0e2

liuyang commited on Sep 5

remove prompt

744c18a

liuyang commited on Sep 5

update prompt

ceb7ebf

liuyang commited on Sep 5

add prompt

c543860

liuyang commited on Sep 5

add prompt

c59adf8

liuyang commited on Sep 5

fix bug

2861a47

liuyang commited on Sep 5

disable batch

6a522dd

liuyang commited on Sep 5

speech duration

2a0543d

liuyang commited on Sep 5

less speech duration

6159f83

liuyang commited on Sep 5

enable vad

51ab2c6

liuyang commited on Sep 5

Refine VAD parameters and transcription options in WhisperTranscriber for improved audio processing. Adjust max speech duration, min speech duration, and silence duration, and set chunk length to 12 seconds.

5411f5d

liuyang commited on Sep 5

update log, no vad

9f7c374

liuyang commited on Sep 4

print log

d947708

liuyang commited on Sep 4

add log

04482af

liuyang commited on Sep 4

disable checksum

24dd7ba

liuyang commited on Sep 4

update

d5d2af9

liuyang commited on Sep 4

fix upload issue

e84487c

liuyang commited on Sep 4

upload data

731f4bf

liuyang commited on Sep 4

Add job_id and task_id handling in WhisperTranscriber to improve metadata management during audio processing. Update file key generation for intermediate uploads.

5dddf57

liuyang commited on Sep 4

fix

4417549

liuyang commited on Sep 3

add upload

06b904d

liuyang commited on Sep 3

Comment out debug print statements in WhisperTranscriber to clean up output during transcription and embedding calculations.

0724301

liuyang commited on Sep 1

Enhance waveform processing in WhisperTranscriber by implementing zero-padding for short audio segments and refining segment slicing logic. Update comments for clarity on embedding requirements and fallback mechanisms.

1461032

liuyang commited on Aug 28

Update waveform handling in WhisperTranscriber to maintain channel dimension during embedding calculations. Adjust comments for clarity on input shape requirements.

5a14daf

liuyang commited on Aug 28

Add error handling in embedding calculation within WhisperTranscriber. Log exceptions and diarization segments for better debugging.

5b655f4

liuyang commited on Aug 28

try add token

3c30a0a

liuyang commited on Aug 23

try add token

baffcb8

liuyang commited on Aug 23

add log

f4b8170

liuyang commited on Aug 23

Implement audio preprocessing and speaker diarization enhancements in WhisperTranscriber. Introduce methods for audio chunk preparation, VAD-based trimming, and speaker embedding extraction. Update process_audio methods to utilize task JSON for improved workflow and metadata handling. Add webrtcvad dependency for voice activity detection.

70438f0

liuyang commited on Aug 21

print result

7c60c3b

liuyang commited on Jul 22

Add batched inference support in WhisperTranscriber for improved transcription performance. Update methods to accept batch size parameters and adjust output formatting accordingly.

7cf016f

liuyang commited on Jul 22

Update transcription batch size to 24 in WhisperTranscriber for improved processing efficiency

aaba71b

liuyang commited on Jul 22

Add model downloading functionality for Faster Whisper in app.py, enabling efficient local caching and improved model loading performance.

3ea9b86

liuyang commited on Jul 22

Refactor model loading in app.py to return both Whisper and diarization models, enhancing GPU utilization during transcription processes.

28a7e7e

liuyang commited on Jul 22

Refactor WhisperTranscriber to use pre-loaded models instead of loading them during transcription, improving performance and reducing overhead.

0cb30bb

liuyang commited on Jul 22

Remove model downloading functionality from app.py, reverting to a fallback model name for initialization.

024a455

liuyang commited on Jul 22

Update Whisper model repository reference in app.py to use OpenAI's whisper-large-v3-turbo

91b90b2

liuyang commited on Jul 22

Add model downloading functionality and update GPU initialization process in app.py

41c92a1

liuyang commited on Jul 22

init

2b27ee7

liuyang commited on Jul 22

init without gpu

fb95829

liuyang commited on Jul 22

int8

9d66376

liuyang commited on Jul 22

Update Whisper model compute type to float16 and adjust transcription batch size to 24 for improved performance

6ffe750

liuyang commited on Jul 22

Update Whisper model configuration to use int8 compute type for improved performance

8283fed

liuyang commited on Jul 21

print result

5db1c04

liuyang commited on Jul 21

Add full audio transcription functionality and update Gradio interface

8c68b8b

liuyang commited on Jul 21

Enhance error handling for nvidia-cudnn-cu12 integration in app.py. Added checks for the presence of the library and improved loading mechanism with appropriate error messages.

d441278

liuyang commited on Jul 21

refactor diarizer to remove FP16 model conversion and related configurations

ce51169

liuyang commited on Jul 21

comment out FP16 model conversion in diarizer

c00855b

liuyang commited on Jul 21

Commit History

Enhance speaker assignment in transcription: Introduced interval overlap calculations and smoothing techniques for improved accuracy in speaker labeling. Added methods for determining dominant speakers and stabilizing segment boundaries. f800f63

enable result printing and comment out text cleanup regex aa984fe

model control caef0e2

remove prompt 744c18a

update prompt ceb7ebf

add prompt c543860

add prompt c59adf8

fix bug 2861a47

disable batch 6a522dd

speech duration 2a0543d

less speech duration 6159f83

enable vad 51ab2c6

Refine VAD parameters and transcription options in WhisperTranscriber for improved audio processing. Adjust max speech duration, min speech duration, and silence duration, and set chunk length to 12 seconds. 5411f5d

update log, no vad 9f7c374

print log d947708

add log 04482af

disable checksum 24dd7ba

update d5d2af9

fix upload issue e84487c

upload data 731f4bf

Add job_id and task_id handling in WhisperTranscriber to improve metadata management during audio processing. Update file key generation for intermediate uploads. 5dddf57

fix 4417549

add upload 06b904d

Comment out debug print statements in WhisperTranscriber to clean up output during transcription and embedding calculations. 0724301

Enhance waveform processing in WhisperTranscriber by implementing zero-padding for short audio segments and refining segment slicing logic. Update comments for clarity on embedding requirements and fallback mechanisms. 1461032

Update waveform handling in WhisperTranscriber to maintain channel dimension during embedding calculations. Adjust comments for clarity on input shape requirements. 5a14daf

Add error handling in embedding calculation within WhisperTranscriber. Log exceptions and diarization segments for better debugging. 5b655f4

try add token 3c30a0a

try add token baffcb8

add log f4b8170

print result 7c60c3b

Add batched inference support in WhisperTranscriber for improved transcription performance. Update methods to accept batch size parameters and adjust output formatting accordingly. 7cf016f

Update transcription batch size to 24 in WhisperTranscriber for improved processing efficiency aaba71b

Add model downloading functionality for Faster Whisper in app.py, enabling efficient local caching and improved model loading performance. 3ea9b86

Refactor model loading in app.py to return both Whisper and diarization models, enhancing GPU utilization during transcription processes. 28a7e7e

Refactor WhisperTranscriber to use pre-loaded models instead of loading them during transcription, improving performance and reducing overhead. 0cb30bb

Remove model downloading functionality from app.py, reverting to a fallback model name for initialization. 024a455

Update Whisper model repository reference in app.py to use OpenAI's whisper-large-v3-turbo 91b90b2

Add model downloading functionality and update GPU initialization process in app.py 41c92a1

init 2b27ee7

init without gpu fb95829

int8 9d66376

Update Whisper model compute type to float16 and adjust transcription batch size to 24 for improved performance 6ffe750

Update Whisper model configuration to use int8 compute type for improved performance 8283fed

print result 5db1c04

Add full audio transcription functionality and update Gradio interface 8c68b8b

Enhance error handling for nvidia-cudnn-cu12 integration in app.py. Added checks for the presence of the library and improved loading mechanism with appropriate error messages. d441278

refactor diarizer to remove FP16 model conversion and related configurations ce51169

comment out FP16 model conversion in diarizer c00855b

Enhance speaker assignment in transcription: Introduced interval overlap calculations and smoothing techniques for improved accuracy in speaker labeling. Added methods for determining dominant speakers and stabilizing segment boundaries.

f800f63

enable result printing and comment out text cleanup regex

aa984fe

model control

caef0e2

remove prompt

744c18a

update prompt

ceb7ebf

add prompt

c543860

add prompt

c59adf8

fix bug

2861a47

disable batch

6a522dd

speech duration

2a0543d

less speech duration

6159f83

enable vad

51ab2c6

Refine VAD parameters and transcription options in WhisperTranscriber for improved audio processing. Adjust max speech duration, min speech duration, and silence duration, and set chunk length to 12 seconds.

5411f5d

update log, no vad

9f7c374

print log

d947708

add log

04482af

disable checksum

24dd7ba

update

d5d2af9

fix upload issue

e84487c

upload data

731f4bf

Add job_id and task_id handling in WhisperTranscriber to improve metadata management during audio processing. Update file key generation for intermediate uploads.

5dddf57

fix

4417549

add upload

06b904d

Comment out debug print statements in WhisperTranscriber to clean up output during transcription and embedding calculations.

0724301

Enhance waveform processing in WhisperTranscriber by implementing zero-padding for short audio segments and refining segment slicing logic. Update comments for clarity on embedding requirements and fallback mechanisms.

1461032

Update waveform handling in WhisperTranscriber to maintain channel dimension during embedding calculations. Adjust comments for clarity on input shape requirements.

5a14daf

Add error handling in embedding calculation within WhisperTranscriber. Log exceptions and diarization segments for better debugging.

5b655f4

try add token

3c30a0a

try add token

baffcb8

add log

f4b8170

print result

7c60c3b

Add batched inference support in WhisperTranscriber for improved transcription performance. Update methods to accept batch size parameters and adjust output formatting accordingly.

7cf016f

Update transcription batch size to 24 in WhisperTranscriber for improved processing efficiency

aaba71b

Add model downloading functionality for Faster Whisper in app.py, enabling efficient local caching and improved model loading performance.

3ea9b86

Refactor model loading in app.py to return both Whisper and diarization models, enhancing GPU utilization during transcription processes.

28a7e7e

Refactor WhisperTranscriber to use pre-loaded models instead of loading them during transcription, improving performance and reducing overhead.

0cb30bb

Remove model downloading functionality from app.py, reverting to a fallback model name for initialization.

024a455

Update Whisper model repository reference in app.py to use OpenAI's whisper-large-v3-turbo

91b90b2

Add model downloading functionality and update GPU initialization process in app.py

41c92a1

init

2b27ee7

init without gpu

fb95829

int8

9d66376

Update Whisper model compute type to float16 and adjust transcription batch size to 24 for improved performance

6ffe750

Update Whisper model configuration to use int8 compute type for improved performance

8283fed

print result

5db1c04

Add full audio transcription functionality and update Gradio interface

8c68b8b

Enhance error handling for nvidia-cudnn-cu12 integration in app.py. Added checks for the presence of the library and improved loading mechanism with appropriate error messages.

d441278

refactor diarizer to remove FP16 model conversion and related configurations

ce51169

comment out FP16 model conversion in diarizer

c00855b