Emotion Detection From Speech

This model is the fine-tuned version of DistilHuBERT which classifies emotions from audio inputs.

Approach

  1. Dataset: The Ravdess dataset, comprising 1,440 audio files with 8 emotion labels: calm, happy, sad, angry, fearful, surprise, neutral, and disgust.
  2. Model Fine-Tuning: The DistilHuBERT model was fine-tuned for 7 epochs with a learning rate of 5e-5, achieving an accuracy of 98% on the test dataset.

Data Preprocessing

  • Sampling Rate: Audio files were resampled to 16kHz to match the model's requirements.
  • Padding: Audio clips shorter than 30 seconds were zero-padded.
  • Train-Test Split: 80% of the samples were used for training, and 20% for testing.

Model Architecture

  • DistilHuBERT: A lightweight variant of HuBERT, fine-tuned for emotion classification.
  • Fine-Tuning Setup:
    • Optimizer: AdamW
    • Loss Function: Cross-Entropy
    • Learning Rate: 5e-5
    • Warm-up Ratio: 0.1
    • Epochs: 7

Results

  • Accuracy: 0.98 on the test dataset
  • Loss: 0.10 on the test dataset

Usage

from transformers import pipeline

pipe = pipeline(
    "audio-classification",
    model="BilalHasan/distilhubert-finetuned-ravdess",
)

emotion = pipe(path_to_your_audio)

Demo

You can access the live demo of the app on Hugging Face Spaces.

Downloads last month
0
Safetensors
Model size
23.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support audio-classification models for flair library.

Model tree for BilalHasan/distilhubert-finetuned-ravdess

Finetuned
(457)
this model

Space using BilalHasan/distilhubert-finetuned-ravdess 1