--- language: en license: mit tags: - audio - captioning - text - audio-captioning - automated-audio-captioning task_categories: - audio-captioning --- # CoNeTTE (ConvNext-Transformer with Task Embedding) for Automated Audio Captioning This model generate a short textual description of any audio file. ## Installation ```bash pip install conette ``` ## Usage ```py from conette import CoNeTTEConfig, CoNeTTEModel config = CoNeTTEConfig.from_pretrained("Labbeti/conette") model = CoNeTTEModel.from_pretrained("Labbeti/conette", config=config) path = "/my/path/to/audio.wav" outputs = model(path) cands = outputs["cands"][0] print(cands) ``` ## Performance TODO ## Additional information The encoder part of the architecture is based on a ConvNeXt model for audio classification, available here: https://huggingface.co/topel/ConvNeXt-Tiny-AT. The encoder weights used are named "convnext_tiny_465mAP_BL_AC_70kit.pth", available on Zenodo: https://zenodo.org/record/8020843. It was created by [@Labbeti](https://hf.co/Labbeti).