How to use this model to just get audio embedding?

#4
by mohitmayank - opened

Is it possible to use this model for speaker identification and for the same how to use this model to just get audio embedding?

NVIDIA org

This model does not use speaker representation (speaker embeddings used in speaker verification tasks, e.g. x-vector) so unfortunately you cannot use it for speaker recognition tasks. Try using TitaNet (https://huggingface.co/nvidia/speakerverification_en_titanet_large) in NeMo toolkit which is speaker embedding extractor. You c an use speaker embedding on top of Sortformer diarizer's output (filtering out silence etc.).

mohitmayank changed discussion status to closed

Sign up or log in to comment