File size: 4,191 Bytes

b9379c5
 
17f4cac
 
 
 
 
 
6d8fd18
17f4cac
 
 
1b0a7e3
f8ec4a5
265b0f6
 
e09d2f4
d6ddaa7
e09d2f4
6d8fd18
 
 
d6ddaa7
6d8fd18
 
066d473
 
d6ddaa7
066d473
 
 
265b0f6
c31fa43
 
 
 
e44500b
c31fa43
 
 
 
 
 
 
 
26fab5e
acaa860
26fab5e
 
 
acaa860
26fab5e
 
 
acaa860
26fab5e
 
c31fa43
26fab5e
 
 
c31fa43
26fab5e
 
 
c31fa43
26fab5e
 
a984665
e09d2f4
 
 
c087caf
6d8fd18
265b0f6
e09d2f4
 
 
 
 
5e5f09d
 
 
 
 
e09d2f4

---
license: mit
datasets:
- ccmusic-database/bel_canto
language:
- en
metrics:
- accuracy
pipeline_tag: audio-classification
tags:
- music
- art
---
# Intro
The Classical and Ethnic Vocal Style Classification model aims to distinguish between classical and ethnic vocal styles, with all audio samples sung by professional vocalists. The model is fine-tuned using an audio dataset consisting of four categories, which has been pre-processed into spectrograms. Initially pretrained in the computer vision (CV) domain, the backbone network undergoes a fine-tuning process specifically designed for vocal style classification tasks. In this model, the pre-training on CV tasks provides a foundation for the network to learn general audio features, which are then adjusted during fine-tuning to adapt to the subtle differences between classical and ethnic vocal styles. The audio dataset, comprising samples from classical and various ethnic singing traditions, enables the model to capture unique patterns associated with each vocal style. Representing spectrograms as input allows the model to effectively analyze both the temporal and frequency components of the audio signals. Through the fine-tuning process, the model continuously enhances its ability to discriminate between sound representations and subtle stylistic differences between classical and ethnic styles. This specialized model holds significant potential in the music industry and cultural preservation, as it accurately categorizes vocal performances into these two broad categories. Its foundation in pre-trained computer vision principles demonstrates the versatility and adaptability of neural networks across different domains, enhancing the model's capability to capture complex features of vocal performances.

## Demo
<https://huggingface.co/spaces/ccmusic-database/bel_canto>

## Usage
```python
from modelscope import snapshot_download
model_dir = snapshot_download("ccmusic-database/bel_canto")
```

## Maintenance
```bash
git clone [email protected]:ccmusic-database/bel_canto
cd bel_canto
```

## Results
|   Backbone    |            Mel            |    CQT    |  Chroma   |
| :-----------: | :-----------------------: | :-------: | :-------: |
|    Swin-S     |         **0.928**         | **0.936** | **0.787** |
|    Swin-T     |           0.906           |   0.863   |   0.731   |
|               |                           |           |           |
|    AlexNet    |           0.919           |   0.920   |   0.746   |
|  ConvNeXt-T   |           0.895           |   0.925   |   0.714   |
|   GoogleNet   | [**0.948**](#best-result) |   0.921   |   0.739   |
|  MNASNet1.3   |           0.931           | **0.931** | **0.765** |
| SqueezeNet1.1 |           0.923           |   0.914   |   0.685   |
|    Average    |           0.921           |   0.916   |   0.738   |

### Best Result
<style>
  #bel td {
    vertical-align: middle !important;
    text-align: center;
  }
  #bel th {
    text-align: center;
  }
</style>
<table id="bel">
    <tr>
        <th>Loss curve</th>
        <td><img src="https://www.modelscope.cn/models/ccmusic-database/bel_canto/resolve/master/googlenet_mel_2024-07-30_00-51-26/loss.jpg"></td>
    </tr>
    <tr>
        <th>Training and validation accuracy</th>
        <td><img src="https://www.modelscope.cn/models/ccmusic-database/bel_canto/resolve/master/googlenet_mel_2024-07-30_00-51-26/acc.jpg"></td>
    </tr>
    <tr>
        <th>Confusion matrix</th>
        <td><img src="https://www.modelscope.cn/models/ccmusic-database/bel_canto/resolve/master/googlenet_mel_2024-07-30_00-51-26/mat.jpg"></td>
    </tr>
</table>

## Dataset
<https://huggingface.co/datasets/ccmusic-database/bel_canto>

## Mirror
<https://www.modelscope.cn/models/ccmusic-database/bel_canto>

## Evaluation
<https://github.com/monetjoe/ccmusic_eval>

## Cite
```bibtex
@article{Zhou-2025,
  title   = {CCMusic: an Open and Diverse Database for Chinese Music Information Retrieval Research},
  author  = {Monan Zhou, Shenyang Xu, Zhaorui Liu, Zhaowen Wang, Feng Yu, Wei Li and Baoqiang Han},
  journal = {Transactions of the International Society for Music Information Retrieval},
  year    = {2025}
}
```