---
extra_gated_prompt: Please read Apache License, Version 2.0 before downloading this model.
extra_gated_fields:
  Country: country
  Affiliation: text
  I agree ALL the statements in Apache License, Version 2: checkbox
extra_gated_button_content: Acknowledge license
license: apache-2.0
language:
- ja
metrics:
- accuracy 
base_model:
- imprt/kushinada-hubert-large
pipeline_tag: audio-classification
tags:
- hubert
- speech
- superb
- audio-classification
---

# JTES speech emotion recognition model for S3PRL using kushinada-hubert-large
`imprt/kushinada-hubert-large-jtes-er`  

This model is a speech emotion recognition model for [S3PRL](https://github.com/s3prl/s3prl/) trained on JTES(Japanese Twitter-based Emotional Speech v1.1) using [imprt/kushinada-hubert-large](https://huggingface.co/imprt/kushinada-hubert-large) as an upstream model.  
  
JTES is an emotional speech database that can be used for emotion recognition as well as recognition and synthesis of speech with various emotions. The database was designed by compiling tweets acquired from Twitter (X) and selecting emotion-dependent tweets considering phonetic and prosodic balance [1]. 

## Demo: How to use with S3PRL
```bash
cd s3prl
git checkout v0.4.17
# copy all files
# store imprt/kushinada-hubert-large s3prl/kushinada-hubert-large-s3prl.pt in s3prl/upstream_models/ directory.
# store JTES(v1.1) in "s3prl/jtes_v1.1".
pip install -e ".[all]"
cd s3prl
# evaluate Session1
python3 run_downstream.py -m evaluate -e result/downstream/kushinada-hubert-large-jtes-er_fold1/dev-best.ckpt -o "config.downstream_expert.datarc.eval_batch_size=1"
```
# RESULTS
### Accuracy

|session|accuracy|
|---|---|
|session1|0.8446|
|session2|0.8471|
|session3|0.8725|
|session4|0.7925|
|session5|0.8822|
|averate|0.8477|


## Citation
### Citing SUPERB

```BibTex
@article{yang2021superb,
  title={SUPERB: Speech processing Universal PERformance Benchmark},
  author={Yang, Shu-wen and Chi, Po-Han and Chuang, Yung-Sung and Lai, Cheng-I Jeff and Lakhotia, Kushal and Lin, Yist Y and Liu, Andy T and Shi, Jiatong and Chang, Xuankai and Lin, Guan-Ting and others},
  journal={arXiv preprint arXiv:2105.01051},
  year={2021}
}
```
### Citing JTES
[1] Emika Takeishi, Takashi Nose, Yuya Chiba, Akinori Ito, "Construction and Analysis of Phonetically and Prosodically Balanced Emotional Speech Database", Proceedings of Oriental COCOSDA, pp. 16−21, 2016.

```BibTex
@INPROCEEDINGS{7918977,
  author={Takeishi, Emika and Nose, Takashi and Chiba, Yuya and Ito, Akinori},
  booktitle={2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA)}, 
  title={Construction and analysis of phonetically and prosodically balanced emotional speech database}, 
  year={2016},
  volume={},
  number={},
  pages={16-21},
  keywords={Speech;Databases;Speech recognition;Entropy;Emotion recognition;Context;Standardization;emotional speech database;speech corpus;emotion recognition;emotional speech recognition},
  doi={10.1109/ICSDA.2016.7918977}}
```

## License

[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)