How to use this model to just get audio embedding?

by mohitmayank - opened 3 days ago

3 days ago

Is it possible to use this model for speaker identification and for the same how to use this model to just get audio embedding?

taejinp

NVIDIA org 3 days ago

This model does not use speaker representation (speaker embeddings used in speaker verification tasks, e.g. x-vector) so unfortunately you cannot use it for speaker recognition tasks. Try using TitaNet (https://huggingface.co/nvidia/speakerverification_en_titanet_large) in NeMo toolkit which is speaker embedding extractor. You c an use speaker embedding on top of Sortformer diarizer's output (filtering out silence etc.).

mohitmayank changed discussion status to closed 2 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment