mispeech
/

ced-base

@@ -27,27 +27,20 @@ Notable differences from other available models include:
 1. Performance: CED with 10M parameters outperforms the majority of previous approaches (~80M).
 ### Model Sources
-- **Original Repository:** https://github.com/RicherMans/CED
-- **Repository:** https://github.com/jimbozhang/hf_transformers_custom_model_ced
 - **Paper:** [CED: Consistent ensemble distillation for audio tagging](https://arxiv.org/abs/2308.11957)
 - **Demo:** https://huggingface.co/spaces/mispeech/ced-base
-## Install
-```bash
-pip install git+https://github.com/jimbozhang/hf_transformers_custom_model_ced.git
-```
 ## Inference
 ```python
->>> from ced_model.feature_extraction_ced import CedFeatureExtractor
->>> from ced_model.modeling_ced import CedForAudioClassification
 >>> model_name = "mispeech/ced-base"
->>> feature_extractor = CedFeatureExtractor.from_pretrained(model_name)
->>> model = CedForAudioClassification.from_pretrained(model_name)
 >>> import torchaudio
->>> audio, sampling_rate = torchaudio.load("resources/JeD5V5aaaoI_931_932.wav")
 >>> assert sampling_rate == 16000
 >>> inputs = feature_extractor(audio, sampling_rate=sampling_rate, return_tensors="pt")

 1. Performance: CED with 10M parameters outperforms the majority of previous approaches (~80M).
 ### Model Sources
+- **Repository:** https://github.com/RicherMans/CED
 - **Paper:** [CED: Consistent ensemble distillation for audio tagging](https://arxiv.org/abs/2308.11957)
 - **Demo:** https://huggingface.co/spaces/mispeech/ced-base
 ## Inference
 ```python
+>>> from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
 >>> model_name = "mispeech/ced-base"
+>>> feature_extractor = AutoFeatureExtractor.from_pretrained(model_name, trust_remote_code=True)
+>>> model = AutoModelForAudioClassification.from_pretrained(model_name, trust_remote_code=True)
 >>> import torchaudio
+>>> audio, sampling_rate = torchaudio.load("/path-to/JeD5V5aaaoI_931_932.wav")
 >>> assert sampling_rate == 16000
 >>> inputs = feature_extractor(audio, sampling_rate=sampling_rate, return_tensors="pt")