|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- Qwen/Qwen2-Audio-7B-Instruct |
|
|
--- |
|
|
|
|
|
- data process |
|
|
|
|
|
- Place the WAV and JSON files in `dev_data`. |
|
|
|
|
|
To distinguish the recognition performance of each part, the file names of the training audio for Part One need to be prefixed with fold1-d-, those for Part Two need to be prefixed with fold1-a-, fold1-b-, fold1-c-, and those for Part Three need to be prefixed with fold1-e-. If the training audio file names for Part One and Part Three do not have the prefixes fold1-d- and fold1-e-, you will need to add them yourself. For example, if the file name of the training audio for Part One is 5402400A, then add the prefix to make it fold1-d-5402400A. If the file name of the audio for Part Three is audio_0001405, then add the prefix to make it fold1-e-audio_0001405. The process for the development set is the same, except that fold1 should be changed to fold2. |
|
|
|
|
|
Download the pre-trained Sentence-BERT model and tokenizer from the following URL |
|
|
and Place the downloaded pre-trained model and tokenizer inside the `../../qwen2_audio_baseline/Bert_pretrain` |
|
|
|
|
|
- Example commands |
|
|
|
|
|
``` |
|
|
git clone https://huggingface.co/PeacefulData/2025_DCASE_AudioQA_Baselines |
|
|
cd 2025_DCASE_AudioQA_Baselines |
|
|
mkdir Bert_pretrain |
|
|
cd Bert_pretrain |
|
|
git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/tree/main |
|
|
``` |
|
|
|
|
|
|
|
|
- Environment |
|
|
|
|
|
```bash |
|
|
cd ../qwen2_audio_baseline |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
You can also use a mirror source to speed up the process. |
|
|
`pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple` |
|
|
|
|
|
- Run Audio QA Inference Baseline |
|
|
```bash |
|
|
sh qwen_audio_test.sh |
|
|
``` |