How to use this model to just get audio embedding?
#4
by
mohitmayank
- opened
Is it possible to use this model for speaker identification and for the same how to use this model to just get audio embedding?
This model does not use speaker representation (speaker embeddings used in speaker verification tasks, e.g. x-vector) so unfortunately you cannot use it for speaker recognition tasks. Try using TitaNet (https://huggingface.co/nvidia/speakerverification_en_titanet_large) in NeMo toolkit which is speaker embedding extractor. You c an use speaker embedding on top of Sortformer diarizer's output (filtering out silence etc.).
mohitmayank
changed discussion status to
closed