Update README.md (#2)
Browse files- Update README.md (2654f85152309f063ece1333ffab3a02a6ef7adf)
Co-authored-by: Grant Lee <[email protected]>
README.md
CHANGED
@@ -4,15 +4,15 @@ library_name: transformers
|
|
4 |
tags: []
|
5 |
---
|
6 |
|
7 |
-
# R1-AQA
|
8 |
|
9 |
<!-- Provide a quick summary of what the model is/does. -->
|
10 |
|
11 |
## Introduction
|
12 |
|
13 |
R1-AQA is a audio question answering (AQA) model based on `Qwen2-Audio-7B-Instruct`, optimized through reinforcement learning using the group relative policy optimization (GRPO) algorithm.
|
14 |
-
This implementation achieved state-of-the-art performance on MMAU *
|
15 |
-
For more details, please refer to our [Github](https://github.com/xiaomi/r1-aqa) and [Report]().
|
16 |
|
17 |
### Table: Accuracies (%) on MMAU Test-mini benchmark
|
18 |
| Model | Method | Sound | Music | Speech | Average |
|
@@ -29,7 +29,7 @@ For more details, please refer to our [Github](https://github.com/xiaomi/r1-aqa)
|
|
29 |
| SALMONN | Direct Inference\* | 41.00 | 34.80 | 25.50 | 33.70 |
|
30 |
| Qwen2-Audio-7B-Instruct | CoTA \[1\] | 60.06 | 64.30 | 60.70 | 61.71 |
|
31 |
| Qwen2-Audio-7B-Instruct | Zero-Shot-CoT \[2\] | 61.86 | 56.29 | 55.26 | 57.80 |
|
32 |
-
| Qwen2-Audio-7B-Instruct
|
33 |
|
34 |
#### Notes:
|
35 |
\* The data are sourced from the MMAU official website: [https://sakshi113.github.io/mmau_homepage/](https://sakshi113.github.io/mmau_homepage/)
|
|
|
4 |
tags: []
|
5 |
---
|
6 |
|
7 |
+
# R1-AQA --- Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
|
8 |
|
9 |
<!-- Provide a quick summary of what the model is/does. -->
|
10 |
|
11 |
## Introduction
|
12 |
|
13 |
R1-AQA is a audio question answering (AQA) model based on `Qwen2-Audio-7B-Instruct`, optimized through reinforcement learning using the group relative policy optimization (GRPO) algorithm.
|
14 |
+
This implementation has achieved state-of-the-art performance on MMAU *Test-mini* benchmark with only 38k post-training samples.
|
15 |
+
For more details, please refer to our [Github](https://github.com/xiaomi/r1-aqa) and [Technical Report](https://arxiv.org/abs/2503.11197).
|
16 |
|
17 |
### Table: Accuracies (%) on MMAU Test-mini benchmark
|
18 |
| Model | Method | Sound | Music | Speech | Average |
|
|
|
29 |
| SALMONN | Direct Inference\* | 41.00 | 34.80 | 25.50 | 33.70 |
|
30 |
| Qwen2-Audio-7B-Instruct | CoTA \[1\] | 60.06 | 64.30 | 60.70 | 61.71 |
|
31 |
| Qwen2-Audio-7B-Instruct | Zero-Shot-CoT \[2\] | 61.86 | 56.29 | 55.26 | 57.80 |
|
32 |
+
| **Qwen2-Audio-7B-Instruct** | **GRPO (Ours)** | **69.37** | 66.77 | 57.36 | **64.50** |
|
33 |
|
34 |
#### Notes:
|
35 |
\* The data are sourced from the MMAU official website: [https://sakshi113.github.io/mmau_homepage/](https://sakshi113.github.io/mmau_homepage/)
|