---
language: en
license: mit
tags:
- audio
- captioning
- text
- audio-captioning
- automated-audio-captioning
task_categories:
- audio-captioning
---

# CoNeTTE (ConvNext-Transformer with Task Embedding) for Automated Audio Captioning

This model generate a short textual description of any audio file.

## Installation
```bash
pip install conette
```

## Usage
```py
from conette import CoNeTTEConfig, CoNeTTEModel

config = CoNeTTEConfig.from_pretrained("Labbeti/conette")
model = CoNeTTEModel.from_pretrained("Labbeti/conette", config=config)

path = "/my/path/to/audio.wav"
outputs = model(path)
cands = outputs["cands"][0]
print(cands)
```

## Performance
TODO

## Additional information

The encoder part of the architecture is based on a ConvNeXt model for audio classification, available here: https://huggingface.co/topel/ConvNeXt-Tiny-AT.
The encoder weights used are named "convnext_tiny_465mAP_BL_AC_70kit.pth", available on Zenodo: https://zenodo.org/record/8020843.

It was created by [@Labbeti](https://hf.co/Labbeti).