jakeBland
/

wav2vec-vm-finetune

Audio Classification

Generated from Trainer

speech-recognition

voicemail-detection

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

jakeBland commited on Feb 10

Commit

0ab05ea

·

verified ·

1 Parent(s): a0b79e4

Update README.md

Files changed (1) hide show

README.md +18 -7

README.md CHANGED Viewed

@@ -4,30 +4,41 @@ license: apache-2.0
 base_model: facebook/wav2vec2-xls-r-300m
 tags:
 - generated_from_trainer
 model-index:
-- name: test12
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/jakebland-bland-ai/vm_new/runs/bn1z5b94)
-# test12
-This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on an unknown dataset.
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure

 base_model: facebook/wav2vec2-xls-r-300m
 tags:
 - generated_from_trainer
+- speech-recognition
+- audio-classification
+- voicemail-detection
 model-index:
+- name: wav2vec-vm-finetune
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# wav2vec-vm-finetune
+This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) for **voicemail detection**. It is trained on a dataset of call recordings to distinguish between **voicemail greetings** and **live human responses**.
 ## Model description
+This model builds on **wav2vec2-xls-r-300m**, a self-supervised speech model trained on large-scale multilingual data. We fine-tuned it on the first two seconds of a call. T**98% accuracy**.
 ## Intended uses & limitations
+- Automated voicemail detection in AI-powered call assistants.
+- Filtering voicemail responses in customer service and sales call automation.
+- Only trianed on the English language.
+- Assumes the voicemail track is isolated and contains no audio from the caller.
+- Designed for the first two seconds of audio when calling a voicemail.
 ## Training and evaluation data
+The model was trained on a proprietary dataset of call recordings, labeled as:
+- **Live human responses**
+- **Voicemail greetings**
+The dataset includes diverse voicemail recordings across multiple types to improve generalization.
 ## Training procedure