Add pipeline tag, library name and clarify license
Browse filesThis PR adds the missing `pipeline_tag` and `library_name` metadata, making the model easier to discover on the Hugging Face Hub. It also clarifies the license, specifying the code is MIT licensed, but the checkpoints are for non-commercial use only.
README.md
CHANGED
@@ -1,10 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# PyTorch Implementation of Audio Flamingo 2
|
2 |
|
3 |
**Sreyan Ghosh, Zhifeng Kong, Sonal Kumar, S Sakshi, Jaehyeon Kim, Wei Ping, Rafael Valle, Dinesh Manocha, Bryan Catanzaro**
|
4 |
|
5 |
[[paper]](https://arxiv.org/abs/2503.03983) [[Demo website]](https://research.nvidia.com/labs/adlr/AF2/) [[GitHub]](https://github.com/NVIDIA/audio-flamingo)
|
6 |
|
7 |
-
This repo contains the PyTorch implementation of [Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities](). Audio Flamingo 2 achieves
|
8 |
|
9 |
- We introduce two datasets, AudioSkills for expert audio reasoning, and LongAudio for long audio understanding, to advance the field of audio understanding.
|
10 |
|
@@ -34,7 +40,7 @@ Audio Flamingo 2 uses a cross-attention architecture similar to [Audio Flamingo]
|
|
34 |
|
35 |
## License
|
36 |
|
37 |
-
|
38 |
- Notice: Audio Flamingo 2 is built with Qwen-2.5. Qwen is licensed under the Qwen RESEARCH LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.
|
39 |
|
40 |
|
|
|
1 |
+
---
|
2 |
+
pipeline_tag: audio-text-to-text
|
3 |
+
library_name: transformers
|
4 |
+
license: mit
|
5 |
+
---
|
6 |
+
|
7 |
# PyTorch Implementation of Audio Flamingo 2
|
8 |
|
9 |
**Sreyan Ghosh, Zhifeng Kong, Sonal Kumar, S Sakshi, Jaehyeon Kim, Wei Ping, Rafael Valle, Dinesh Manocha, Bryan Catanzaro**
|
10 |
|
11 |
[[paper]](https://arxiv.org/abs/2503.03983) [[Demo website]](https://research.nvidia.com/labs/adlr/AF2/) [[GitHub]](https://github.com/NVIDIA/audio-flamingo)
|
12 |
|
13 |
+
This repo contains the PyTorch implementation of [Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities](https://arxiv.org/abs/2503.03983). Audio Flamingo 2 achieves state-of-the-art performance across over 20 benchmarks, using only a 3B parameter small language model. It is improved from our previous [Audio Flamingo](https://arxiv.org/abs/2402.01831).
|
14 |
|
15 |
- We introduce two datasets, AudioSkills for expert audio reasoning, and LongAudio for long audio understanding, to advance the field of audio understanding.
|
16 |
|
|
|
40 |
|
41 |
## License
|
42 |
|
43 |
+
The code in this repo is under MIT license. The checkpoints are for non-commercial use only (see NVIDIA OneWay Noncommercial License). They are also subject to the [Qwen Research license](https://huggingface.co/Qwen/Qwen2.5-3B/blob/main/LICENSE), the [Terms of Use](https://openai.com/policies/terms-of-use) of the data generated by OpenAI, and the original licenses accompanying each training dataset.
|
44 |
- Notice: Audio Flamingo 2 is built with Qwen-2.5. Qwen is licensed under the Qwen RESEARCH LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.
|
45 |
|
46 |
|