nvidia
/

Frame_VAD_Multilingual_MarbleNet_v2.0

Voice Activity Detection

Model card Files Files and versions

Frame_VAD_Multilingual_MarbleNet_v2.0 / explainability.md

naymaraq's picture

Upload explainability.md with huggingface_hub

0444bfd verified 6 months ago

|

2.57 kB

	Field \| Response
	:------------------------------------------------------------------------------------------------------\|:---------------------------------------------------------------------------------
	Intended Domain: \| Voice Activity Detection (VAD)
	Model Type: \| Convolutional Neural Network (CNN)
	Intended Users: \| Developers, Speech Processing Engineers, AI Researchers
	Output: \| Sequence of speech probabilities for each 20 millisecond audio frame
	Describe how the model works: \| The model processes input audio by extracting spectrogram features, which are then passed through MarbleNet—a lightweight CNN-based model designed for VAD. The CNN learns to detect patterns associated with speech activity and outputs a probability score indicating the presence of speech in each 20 millisecond frame
	Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: \| Not Applicable
	Technical Limitations: \| The model operates on 20 millisecond frames. While it supports longer frames by breaking them into smaller segments, it does not support outputs with a finer granularity than 20 milliseconds.
	Verified to have met prescribed NVIDIA quality standards: \| Yes
	Performance Metrics: \| Accuracy (False Positive Rate, ROC-AUC score), Latency, Throughput
	Potential Known Risks: \| While the model was trained on a limited number of languages, including Chinese, English, French, Spanish, German, and Russian, the model may experience a degradation in quality for languages and accents that are not included in the training dataset
	Licensing: \| [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license)